Write SQL for ads metrics and variability
Company: Roblox
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Write ANSI SQL to compute daily and campaign-level metrics, including averages and standard deviations of daily CTR and CPC, using the schema and sample data below. Assume deduplication: count at most one click per impression (keep earliest click by ts). Treat CPC as NULL on days with zero clicks and exclude NULLs from CPC aggregates. Compute for date range 2025-08-30 to 2025-09-01, and also return which days are CTR outliers per campaign (CTR > mean_daily_ctr + 2*stddev_daily_ctr within the window).
Schema:
- ad_impressions(imp_id BIGINT PRIMARY KEY, user_id BIGINT, ad_id BIGINT, campaign_id BIGINT, ts TIMESTAMP)
- ad_clicks(click_id BIGINT PRIMARY KEY, imp_id BIGINT, user_id BIGINT, ad_id BIGINT, campaign_id BIGINT, ts TIMESTAMP)
- ad_spend(dt DATE, campaign_id BIGINT, spend_cents INT)
Sample data:
Table: ad_impressions
+--------+---------+-------+-------------+---------------------+
| imp_id | user_id | ad_id | campaign_id | ts |
+--------+---------+-------+-------------+---------------------+
| 1 | 101 | 11 | 1001 | 2025-08-30 10:00:00 |
| 2 | 102 | 12 | 1001 | 2025-08-30 10:05:00 |
| 3 | 101 | 11 | 1002 | 2025-08-31 09:00:00 |
| 4 | 103 | 13 | 1001 | 2025-08-31 10:00:00 |
| 5 | 104 | 14 | 1002 | 2025-09-01 12:00:00 |
+--------+---------+-------+-------------+---------------------+
Table: ad_clicks
+----------+--------+---------+-------+-------------+---------------------+
| click_id | imp_id | user_id | ad_id | campaign_id | ts |
+----------+--------+---------+-------+-------------+---------------------+
| 10 | 1 | 101 | 11 | 1001 | 2025-08-30 10:00:03 |
| 11 | 2 | 102 | 12 | 1001 | 2025-08-30 10:05:04 |
| 12 | 2 | 102 | 12 | 1001 | 2025-08-30 10:05:05 |
| 13 | 4 | 103 | 13 | 1001 | 2025-08-31 10:00:05 |
+----------+--------+---------+-------+-------------+---------------------+
Table: ad_spend
+------------+-------------+-------------+
| dt | campaign_id | spend_cents |
+------------+-------------+-------------+
| 2025-08-30 | 1001 | 300 |
| 2025-08-31 | 1001 | 200 |
| 2025-08-31 | 1002 | 100 |
| 2025-09-01 | 1002 | 500 |
+------------+-------------+-------------+
Tasks:
A) Produce a daily table per campaign with: dt, impressions, unique_clicks_dedup, CTR (unique_clicks_dedup/impressions), CPC (spend_cents/unique_clicks_dedup; NULL if 0), and a flag if the day is an outlier based on the 2-sigma rule computed over 2025-08-30..2025-09-01.
B) Produce an aggregate per campaign with mean_daily_ctr and stddev_daily_ctr (use STDDEV_SAMP over daily CTR values where impressions>0), mean_daily_cpc and stddev_daily_cpc (exclude NULL CPC). Also return a comma-separated list of dt that are outliers.
C) How would your query change if the window were the last 7 days relative to "today" where today=2025-09-01?
Quick Answer: This question evaluates a candidate's ability to perform SQL-based data manipulation and analytics, including deduplication of events, computation of daily and campaign-level metrics (CTR, CPC), and calculation of statistical summaries (mean and standard deviation) and outlier detection over a specified date range.