Write SQL for Moderation Metrics

Q: Write SQL for Moderation Metrics

This question evaluates SQL data manipulation and analytical skills, specifically the ability to perform time-windowed aggregations, join event data with model-derived policy scores, and compute prevalence ratios and monthly distributions using probabilistic classifications.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Q: What difficulty level is this interview question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Onsite rounds at Airwallex.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Airwallex during technical interviews.

Question

Assume the following two tables.

post_views

user_id INTEGER: ID of the user who viewed the post
post_id INTEGER: ID of the post
view_count INTEGER: Number of views generated by that user for that post on that date
view_date STRING: View date in YYYY-MM-DD format Primary key: ( user_id , post_id , view_date )

post_policy_scores

post_id INTEGER: ID of the post
violation_type STRING: Policy category such as Spam , Scam , Nudity , or Harassment
probability_violating DOUBLE: Model-estimated probability that the post violates the policy for that violation type Primary key: ( post_id , violation_type )

Relationship: one post can have multiple rows in post_policy_scores, one per possible violation type.

Assume all time windows are inclusive, dates should be interpreted in UTC, and the reference date is the maximum view_date in post_views.

Write SQL for the following tasks:

Count how many distinct posts have at least 10 total views in the last 7 days. Aggregate view_count across all users and all dates in the 7-day window. Required output column: posts_with_at_least_10_views_7d .
Compute the view prevalence of Nudity in the last 30 days. Define a post as a Nudity post if violation_type = 'Nudity' and probability_violating > 0.5 . Define view prevalence as: total views on Nudity posts in the last 30 days / total views on all posts in the last 30 days . Required output columns: nudity_views_30d , total_views_30d , nudity_view_prevalence_30d .
Compute the monthly violation-type distribution. For each calendar month derived from view_date , count the number of distinct posts viewed in that month that are classified as violating under the rule probability_violating > 0.5 , grouped by violation_type . Then compute each violation type's share of all violating posts viewed in that month. Required output columns: month , violation_type , violating_post_count , monthly_violation_share .

Write SQL for Moderation Metrics

Quick Overview

Comments (0)