Assume the following two tables.
post_views
-
user_id
INTEGER: ID of the user who viewed the post
-
post_id
INTEGER: ID of the post
-
view_count
INTEGER: Number of views generated by that user for that post on that date
-
view_date
STRING: View date in
YYYY-MM-DD
format
Primary key: (
user_id
,
post_id
,
view_date
)
post_policy_scores
-
post_id
INTEGER: ID of the post
-
violation_type
STRING: Policy category such as
Spam
,
Scam
,
Nudity
, or
Harassment
-
probability_violating
DOUBLE: Model-estimated probability that the post violates the policy for that violation type
Primary key: (
post_id
,
violation_type
)
Relationship: one post can have multiple rows in post_policy_scores, one per possible violation type.
Assume all time windows are inclusive, dates should be interpreted in UTC, and the reference date is the maximum view_date in post_views.
Write SQL for the following tasks:
-
Count how many distinct posts have at least 10 total views in the last 7 days. Aggregate
view_count
across all users and all dates in the 7-day window. Required output column:
posts_with_at_least_10_views_7d
.
-
Compute the view prevalence of
Nudity
in the last 30 days. Define a post as a
Nudity
post if
violation_type = 'Nudity'
and
probability_violating > 0.5
. Define view prevalence as:
total views on Nudity posts in the last 30 days / total views on all posts in the last 30 days
.
Required output columns:
nudity_views_30d
,
total_views_30d
,
nudity_view_prevalence_30d
.
-
Compute the monthly violation-type distribution. For each calendar month derived from
view_date
, count the number of distinct posts viewed in that month that are classified as violating under the rule
probability_violating > 0.5
, grouped by
violation_type
. Then compute each violation type's share of all violating posts viewed in that month. Required output columns:
month
,
violation_type
,
violating_post_count
,
monthly_violation_share
.