Question
You are working on a content moderation analytics problem. You have two tables.
Table 1: content_views (one row per user / post / day)
-
user_id
INTEGER: user who viewed the post
-
post_id
INTEGER: viewed post ID
-
view_count
INTEGER: number of views that user generated for that post on that date
-
view_date
STRING: date of the views in
YYYY-MM-DD
format
-
Primary key: (
user_id
,
post_id
,
view_date
)
Table 2: post_policy_scores (one row per post / violation type)
-
post_id
INTEGER: post ID
-
violation_type
STRING: policy category such as
Spam
,
Scam
,
Nudity
, or
Harassment
-
probability_violating
DOUBLE: model-estimated probability that the post violates that policy
-
Primary key: (
post_id
,
violation_type
)
Relationship: one post can have multiple rows in post_policy_scores, one per possible violation type. content_views.post_id joins to post_policy_scores.post_id.
Assumptions
-
Treat
view_date
as a UTC date string and cast it to
DATE
when needed.
-
Let
analysis_date
be the maximum
view_date
present in
content_views
.
-
The last 7 days and last 30 days are inclusive windows ending on
analysis_date
.
-
For classification-based questions, treat a post as belonging to a violation type when
probability_violating > 0.5
.
-
For monthly aggregation, use the calendar month derived from
view_date
.
Write SQL for the following tasks:
-
Posts with at least 10 views (last 7 days).
Count how many distinct posts received at least 10 total views in the last 7 days. Aggregate
view_count
across all users and all dates in the window, and apply the 10-view filter
after
grouping by
post_id
. Required output column:
num_posts
.
-
Nudity view prevalence (last 30 days).
Define a post as a
Nudity
post if it has a
post_policy_scores
row with
violation_type = 'Nudity'
and
probability_violating > 0.5
. Compute:
view prevalence = total views on Nudity posts in the last 30 days / total views on all posts in the last 30 days
.
Required output columns:
nudity_views
,
total_views
,
nudity_view_prevalence
.
-
Monthly violation-type distribution.
For each calendar month (derived from
view_date
) and each
violation_type
, among post/violation pairs where
probability_violating > 0.5
and the post received at least one view in that month, report the number of distinct posts assigned to that violation type and that type's share of all flagged post/violation pairs in the month. Required output columns:
month
,
violation_type
,
flagged_post_count
,
distribution_pct
.