Compute and rank top bad advertisers
Company: TikTok
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
SQL on ad safety. Assume the following schema and sample rows. Use ANSI SQL. Today is 2025-09-01; interpret “last 7 days” as 2025-08-26 00:00:00 to 2025-09-01 23:59:59 UTC.
Tables:
- advertisers(advertiser_id INT, name TEXT)
- ads(ad_id INT, advertiser_id INT, created_at TIMESTAMP)
- ad_page_visits(visit_id BIGINT, ad_id INT, user_id BIGINT, visited_at TIMESTAMP)
- ad_reports(report_id BIGINT, ad_id INT, user_id BIGINT, reported_at TIMESTAMP, reason TEXT)
Sample data:
advertisers
+---------------+-----------+
| advertiser_id | name |
+---------------+-----------+
| 101 | Alpha Co |
| 102 | Beta LLC |
| 103 | Gamma Inc |
+---------------+-----------+
ads
+-------+---------------+---------------------+
| ad_id | advertiser_id | created_at |
+-------+---------------+---------------------+
| 1 | 101 | 2025-08-15 10:00:00 |
| 2 | 101 | 2025-08-20 11:00:00 |
| 3 | 102 | 2025-08-25 12:00:00 |
| 4 | 103 | 2025-08-28 09:00:00 |
+-------+---------------+---------------------+
ad_page_visits
+----------+------+---------+---------------------+
| visit_id | ad_id| user_id | visited_at |
+----------+------+---------+---------------------+
| 1 | 1 | 1001 | 2025-08-26 10:00:00 |
| 2 | 1 | 1002 | 2025-08-26 10:05:00 |
| 3 | 1 | 1003 | 2025-08-27 08:00:00 |
| 4 | 2 | 1001 | 2025-08-28 12:00:00 |
| 5 | 2 | 1004 | 2025-08-29 13:00:00 |
| 6 | 3 | 1002 | 2025-08-30 14:00:00 |
| 7 | 3 | 1005 | 2025-08-31 15:00:00 |
| 8 | 4 | 1006 | 2025-08-31 20:00:00 |
| 9 | 4 | 1002 | 2025-09-01 09:00:00 |
| 10 | 4 | 1007 | 2025-09-01 10:30:00 |
+----------+------+---------+---------------------+
ad_reports
+-----------+------+---------+---------------------+-----------+
| report_id | ad_id| user_id | reported_at | reason |
+-----------+------+---------+---------------------+-----------+
| 500 | 1 | 1002 | 2025-08-26 10:06:00 | misleading|
| 501 | 1 | 1003 | 2025-08-27 08:02:00 | offensive |
| 502 | 2 | 1004 | 2025-08-29 13:05:00 | spam |
| 503 | 3 | 1005 | 2025-08-31 15:05:00 | scam |
| 504 | 4 | 1006 | 2025-08-31 20:02:00 | offensive |
+-----------+------+---------+---------------------+-----------+
Tasks:
A) Define “top bad advertiser” as one meeting all of: at least 1,000 page visits and at least 50 unique reporters in the last 7 days; rank advertisers by report_rate = total_reports / total_visits (descending). Write a single query that outputs: advertiser_id, total_visits, total_reports, unique_reporters, report_rate, rank, keeping only the top 5 ranks. Use window functions (RANK or DENSE_RANK) and deterministic tiebreakers: total_reports DESC, advertiser_id ASC.
B) Modify the query so that if an advertiser has multiple ads, you also output the worst ad_id per advertiser (highest ad-level report_rate over the last 7 days, with at least 100 visits), breaking ties by total_reports DESC then ad_id ASC.
C) Return a daily leaderboard (one row per advertiser per day in the last 7 days) with that day’s rank by report_rate, but only over advertisers having ≥200 visits that day. Ensure advertisers with zero reports still appear (rate = 0) and days with no visits are excluded.
D) Explain in comments how your query avoids double-counting when a user both visits and reports in the same minute, and how it handles reports without a corresponding visit row within the window.
Quick Answer: This question evaluates proficiency in SQL data manipulation and analytics, including time-window filtering, JOINs, aggregation and deduplication, rate calculations, and window-function–based ranking.