Compute view prevalence from views and labels

Q: Compute view prevalence from views and labels

This question evaluates a candidate's competency in temporal joins, stateful deduplication of moderation decisions, time-aware aggregation, and prevalence metric calculations within content-moderation analytics.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Given the tables below, write SQL to compute view prevalence of violating content. Use “today” = 2025-09-01 and report the last 7 days (2025-08-26 to 2025-09-01 inclusive). Provide two variants per country and day: Ex-ante prevalence counts a view as violating only if a VIOLATION decision existed at or before the view timestamp; Ex-post prevalence uses the final decision regardless of when it was detected. Then, list the top 3 creators by ex-post violating_view_share over this window and break down their violating views by surface. Schema and small samples:

Tables: content_views(view_ts TIMESTAMP, user_id INT, content_id INT, surface VARCHAR, country VARCHAR) content_moderation(content_id INT, decision VARCHAR, violation_type VARCHAR, detected_ts TIMESTAMP, source VARCHAR, is_removed BOOLEAN) content(content_id INT, creator_id INT, created_ts TIMESTAMP, country VARCHAR)

Samples: content | content_id | creator_id | created_ts | country | | 101 | 1001 | 2025-08-25 10:00:00 | US | | 102 | 1002 | 2025-08-30 12:00:00 | US | | 103 | 1003 | 2025-08-31 08:00:00 | CA |

content_views | view_ts | user_id | content_id | surface | country | | 2025-08-26 09:00:00 | 504 | 101 | feed | US | | 2025-08-30 18:00:00 | 501 | 102 | feed | US | | 2025-08-31 20:00:00 | 501 | 101 | feed | US | | 2025-09-01 09:00:00 | 502 | 102 | feed | US | | 2025-09-01 10:00:00 | 503 | 103 | search | CA |

Implementation details to reflect in SQL: treat the latest decision per content_id as the final decision for ex-post; for ex-ante, use the most recent decision with detected_ts <= view_ts. Assume contents without any moderation row at view time are CLEAN for ex-ante. Return: (view_date, country, total_views, violating_views_ex_ante, violating_views_ex_post, ex_ante_prevalence, ex_post_prevalence). Then compute the creator-level breakdown requested.

Compute view prevalence from views and labels

Overview

Comments (0)