Compute CTR, RPM, and daily RPM variability in SQL
Company: Roblox
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Write ANSI-Postgres SQL to compute, for each campaign, over the last 7 days inclusive (assume "today" is 2025-09-01, so the window is 2025-08-26 through 2025-09-01):
1) Total impressions and total charged clicks (dedupe rule below),
2) CTR = charged_clicks / impressions,
3) RPM = (sum of charged click revenue in USD) / impressions * 1000,
4) Standard deviation of daily RPM across the 7 days (use stddev_samp on daily RPM),
5) Return only campaigns with at least 10,000 impressions in the 7-day window,
6) Order by RPM DESC, then by daily RPM stddev ASC.
Important deduping: treat multiple clicks with the same ad_request_id as a single charged click worth the earliest click’s revenue; impressions remain one per ad_request_id.
Schema and sample data (ASCII, minimal rows just to illustrate):
Table: campaigns(campaign_id INT, campaign_name TEXT, vertical TEXT)
+-------------+---------------+----------+
| campaign_id | campaign_name | vertical |
+-------------+---------------+----------+
| 10 | "Shoes_US" | "Retail" |
| 11 | "Games_UK" | "Gaming" |
+-------------+---------------+----------+
Table: ad_events(dt DATE, event_time TIMESTAMP, ad_request_id TEXT, user_id INT, campaign_id INT, event_type TEXT, revenue_cents INT)
+------------+---------------------+--------------+---------+-------------+------------+---------------+
| dt | event_time | ad_request_id| user_id | campaign_id | event_type | revenue_cents |
+------------+---------------------+--------------+---------+-------------+------------+---------------+
| 2025-08-30 | 2025-08-30 10:00:01 | R1 | 101 | 10 | impression | 0 |
| 2025-08-30 | 2025-08-30 10:00:03 | R1 | 101 | 10 | click | 120 |
| 2025-08-30 | 2025-08-30 10:05:00 | R2 | 102 | 10 | impression | 0 |
| 2025-08-30 | 2025-08-30 10:05:20 | R2 | 102 | 10 | click | 120 |
| 2025-08-31 | 2025-08-31 11:01:00 | R3 | 103 | 11 | impression | 0 |
| 2025-08-31 | 2025-08-31 11:01:10 | R3 | 103 | 11 | click | 80 |
| 2025-08-31 | 2025-08-31 11:01:12 | R3 | 103 | 11 | click | 80 |
| 2025-09-01 | 2025-09-01 09:00:00 | R4 | 104 | 10 | impression | 0 |
+------------+---------------------+--------------+---------+-------------+------------+---------------+
Include all necessary CTEs (e.g., to pick the earliest click per ad_request_id), ensure no double-counting across days, and make your final SELECT exactly match the requirements above.
Quick Answer: This question evaluates proficiency in SQL data manipulation, specifically deduplication of event-level clicks by ad_request_id, time-window filtering, join-based enrichment with campaign metadata, and calculation of advertising metrics such as CTR and RPM.