Analyze video flags and reviews with SQL
Company: Google
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Take-home Project
You are designing SQL queries for YouTube Trust & Safety. Use the schema and sample data below. Unless stated otherwise, treat a flag as reviewed if there exists a Reviews row for that flag with a non-NULL reviewed_outcome. If a user flags the same video multiple times, count that user at most once per video when asked for distinct-user counts. If multiple items tie for a maximum, return all ties. Schema:
- Users(user_id PK, first_name, last_name)
- Videos(video_id PK, title)
- Flags(flag_id PK, user_id FK->Users.user_id, video_id FK->Videos.video_id, flagged_at TIMESTAMP)
- Reviews(flag_id PK FK->Flags.flag_id, reviewed_date DATE, reviewed_outcome VARCHAR CHECK(reviewed_outcome IN ('APPROVED','REJECTED','ESCALATED') OR reviewed_outcome IS NULL))
Sample rows (minimal, illustrative):
Users
+---------+------------+-----------+
| user_id | first_name | last_name |
+---------+------------+-----------+
| 1 | Alice | Zhang |
| 2 | Bob | Li |
| 3 | Carol | Chen |
+---------+------------+-----------+
Videos
+----------+----------------+
| video_id | title |
+----------+----------------+
| 10 | Cat Tricks |
| 11 | Dog Tricks |
| 12 | Bird Tricks |
+----------+----------------+
Flags
+---------+---------+----------+---------------------+
| flag_id | user_id | video_id | flagged_at |
+---------+---------+----------+---------------------+
| 100 | 1 | 10 | 2025-08-30 10:00:00 |
| 101 | 1 | 10 | 2025-08-31 09:00:00 |
| 102 | 2 | 10 | 2025-08-31 12:00:00 |
| 103 | 2 | 11 | 2025-09-01 08:30:00 |
| 104 | 3 | 10 | 2025-09-01 11:45:00 |
| 105 | 3 | 12 | 2025-09-01 12:00:00 |
+---------+---------+----------+---------------------+
Reviews
+---------+---------------+------------------+
| flag_id | reviewed_date | reviewed_outcome |
+---------+---------------+------------------+
| 100 | 2025-09-02 | APPROVED |
| 101 | 2025-09-02 | REJECTED |
| 102 | 2025-09-03 | APPROVED |
| 103 | 2025-09-03 | APPROVED |
| 104 | NULL | NULL |
+---------+---------------+------------------+
Tasks — write standard SQL for each:
1) Count distinct-user flagging per video: Return video_id and distinct_user_flaggers for every video, where a user who flagged the same video multiple times counts once. Also include total_flags for that video (count of all Flags rows) as a second column. Order by distinct_user_flaggers DESC, then total_flags DESC, then video_id ASC.
2) Most-flagged video(s) and reviewed-flag count: Find the video_id(s) with the highest total number of flags. For those video_id(s), return video_id, total_flags, reviewed_flags where reviewed_flags counts Flags that have a Reviews row with non-NULL reviewed_outcome. If multiple videos tie for total_flags, return all ties.
3) Top user by approved-video count: Which user flagged the most distinct videos that eventually got APPROVED? Count each (user_id, video_id) at most once even if the user flagged that video multiple times; consider a video approved-by-user if there exists at least one of the user’s flags on that video with reviewed_outcome = 'APPROVED'. Return user_id, approved_videos_flagged, first_name, last_name; if there’s a tie for the maximum approved_videos_flagged, return all tied users.
4) Rows containing NULLs: Write queries to return all rows that contain at least one NULL in the Reviews table (any column) and, separately, all rows in the Flags table where any non-PK column (user_id, video_id, or flagged_at) is NULL. Return all columns for those rows.
Quick Answer: This question evaluates SQL skills for data manipulation, including aggregation, joins, deduplication of repeated user actions, and conditional counting based on review status.