PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Data Manipulation (SQL/Python)/Airwallex

Compute Moderation View Metrics

Last updated: Jun 15, 2026

Quick Overview

An Airwallex Data Scientist onsite SQL question on content-moderation analytics. You write three queries over a per-day view table and a one-to-many post/violation-type policy-score table: distinct posts with 10+ views in the last 7 days, Nudity view prevalence over the last 30 days, and a monthly violation-type distribution. The core challenge is windowed aggregation plus avoiding view double-counting across violation types.

  • medium
  • Airwallex
  • Data Manipulation (SQL/Python)
  • Data Scientist

Compute Moderation View Metrics

Company: Airwallex

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: medium

Interview Round: Onsite

##### Question You are working on a content moderation analytics problem. You have two tables. **Table 1: `content_views`** (one row per user / post / day) - `user_id` INTEGER: user who viewed the post - `post_id` INTEGER: viewed post ID - `view_count` INTEGER: number of views that user generated for that post on that date - `view_date` STRING: date of the views in `YYYY-MM-DD` format - Primary key: (`user_id`, `post_id`, `view_date`) **Table 2: `post_policy_scores`** (one row per post / violation type) - `post_id` INTEGER: post ID - `violation_type` STRING: policy category such as `Spam`, `Scam`, `Nudity`, or `Harassment` - `probability_violating` DOUBLE: model-estimated probability that the post violates that policy - Primary key: (`post_id`, `violation_type`) **Relationship:** one post can have multiple rows in `post_policy_scores`, one per possible violation type. `content_views.post_id` joins to `post_policy_scores.post_id`. **Assumptions** - Treat `view_date` as a UTC date string and cast it to `DATE` when needed. - Let `analysis_date` be the maximum `view_date` present in `content_views`. - The last 7 days and last 30 days are inclusive windows ending on `analysis_date`. - For classification-based questions, treat a post as belonging to a violation type when `probability_violating > 0.5`. - For monthly aggregation, use the calendar month derived from `view_date`. Write SQL for the following tasks: 1. **Posts with at least 10 views (last 7 days).** Count how many distinct posts received at least 10 total views in the last 7 days. Aggregate `view_count` across all users and all dates in the window, and apply the 10-view filter *after* grouping by `post_id`. Required output column: `num_posts`. 2. **Nudity view prevalence (last 30 days).** Define a post as a `Nudity` post if it has a `post_policy_scores` row with `violation_type = 'Nudity'` and `probability_violating > 0.5`. Compute: `view prevalence = total views on Nudity posts in the last 30 days / total views on all posts in the last 30 days`. Required output columns: `nudity_views`, `total_views`, `nudity_view_prevalence`. 3. **Monthly violation-type distribution.** For each calendar month (derived from `view_date`) and each `violation_type`, among post/violation pairs where `probability_violating > 0.5` and the post received at least one view in that month, report the number of distinct posts assigned to that violation type and that type's share of all flagged post/violation pairs in the month. Required output columns: `month`, `violation_type`, `flagged_post_count`, `distribution_pct`.

Quick Answer: An Airwallex Data Scientist onsite SQL question on content-moderation analytics. You write three queries over a per-day view table and a one-to-many post/violation-type policy-score table: distinct posts with 10+ views in the last 7 days, Nudity view prevalence over the last 30 days, and a monthly violation-type distribution. The core challenge is windowed aggregation plus avoiding view double-counting across violation types.

Airwallex logo
Airwallex
Nov 14, 2025, 12:00 AM
Data Scientist
Onsite
Data Manipulation (SQL/Python)
6
0
Question

You are working on a content moderation analytics problem. You have two tables.

Table 1: content_views (one row per user / post / day)

  • user_id INTEGER: user who viewed the post
  • post_id INTEGER: viewed post ID
  • view_count INTEGER: number of views that user generated for that post on that date
  • view_date STRING: date of the views in YYYY-MM-DD format
  • Primary key: ( user_id , post_id , view_date )

Table 2: post_policy_scores (one row per post / violation type)

  • post_id INTEGER: post ID
  • violation_type STRING: policy category such as Spam , Scam , Nudity , or Harassment
  • probability_violating DOUBLE: model-estimated probability that the post violates that policy
  • Primary key: ( post_id , violation_type )

Relationship: one post can have multiple rows in post_policy_scores, one per possible violation type. content_views.post_id joins to post_policy_scores.post_id.

Assumptions

  • Treat view_date as a UTC date string and cast it to DATE when needed.
  • Let analysis_date be the maximum view_date present in content_views .
  • The last 7 days and last 30 days are inclusive windows ending on analysis_date .
  • For classification-based questions, treat a post as belonging to a violation type when probability_violating > 0.5 .
  • For monthly aggregation, use the calendar month derived from view_date .

Write SQL for the following tasks:

  1. Posts with at least 10 views (last 7 days). Count how many distinct posts received at least 10 total views in the last 7 days. Aggregate view_count across all users and all dates in the window, and apply the 10-view filter after grouping by post_id . Required output column: num_posts .
  2. Nudity view prevalence (last 30 days). Define a post as a Nudity post if it has a post_policy_scores row with violation_type = 'Nudity' and probability_violating > 0.5 . Compute: view prevalence = total views on Nudity posts in the last 30 days / total views on all posts in the last 30 days . Required output columns: nudity_views , total_views , nudity_view_prevalence .
  3. Monthly violation-type distribution. For each calendar month (derived from view_date ) and each violation_type , among post/violation pairs where probability_violating > 0.5 and the post received at least one view in that month, report the number of distinct posts assigned to that violation type and that type's share of all flagged post/violation pairs in the month. Required output columns: month , violation_type , flagged_post_count , distribution_pct .

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Airwallex•More Data Scientist•Airwallex Data Scientist•Airwallex Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.