Write SQL for rolling frequency caps
Company: Netflix
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
You work on ad serving with per-user rolling 7-day frequency caps at the campaign level. Assume "today" is 2025-09-01 and t0 = 2025-09-01 12:00:00 UTC. Write SQL (any dialect with window functions and common table expressions) to answer parts (A)-(C) exactly using the schema and sample data below.
Schema:
- users(user_id INT PRIMARY KEY, household_id TEXT, country TEXT)
- campaigns(campaign_id INT PRIMARY KEY, cap_7d INT, start_date DATE, end_date DATE)
- impressions(impression_id INT PRIMARY KEY, ts TIMESTAMP, user_id INT, campaign_id INT)
- clicks(click_id INT PRIMARY KEY, impression_id INT, ts TIMESTAMP)
- conversions(conv_id INT PRIMARY KEY, click_id INT, ts TIMESTAMP, revenue DECIMAL(10,2))
Sample data (ASCII):
users
user_id | household_id | country
1 | A | US
2 | A | US
3 | B | CA
4 | C | US
campaigns
campaign_id | cap_7d | start_date | end_date
10 | 3 | 2025-08-15 | 2025-09-30
20 | 5 | 2025-08-20 | 2025-09-15
impressions
impression_id | ts | user_id | campaign_id
100 | 2025-08-26 10:00 | 1 | 10
101 | 2025-08-30 09:00 | 1 | 10
102 | 2025-08-31 09:10 | 1 | 10
103 | 2025-09-01 08:00 | 1 | 10
104 | 2025-08-28 12:00 | 2 | 10
105 | 2025-08-30 13:00 | 2 | 10
106 | 2025-08-26 14:00 | 3 | 20
107 | 2025-08-29 15:00 | 3 | 20
108 | 2025-08-30 16:00 | 3 | 20
109 | 2025-09-01 11:00 | 4 | 10
clicks
click_id | impression_id | ts
1000 | 101 | 2025-08-30 09:05
1001 | 102 | 2025-08-31 09:12
1002 | 108 | 2025-08-30 16:01
1003 | 103 | 2025-09-01 08:01
conversions
conv_id | click_id | ts | revenue
2000 | 1000 | 2025-08-30 10:00 | 5.00
2001 | 1001 | 2025-09-01 10:00 | 0.00
2002 | 1002 | 2025-09-01 12:00 | 20.00
2003 | 1003 | 2025-09-01 13:00 | 8.00
Assumptions: (i) Only impressions within a campaign’s [start_date, end_date] are valid; (ii) A rolling 7-day window at time t includes [t-7 days, t); (iii) An impression beyond the cap is the (cap_7d+1)-th, (cap_7d+2)-th, etc., within the 7-day window; (iv) Revenue is attributed to the impression via click->conversion chain; (v) If multiple conversions follow a click, sum their revenue.
Tasks:
(A) For each campaign at t0, output: campaign_id, users_at_cap (count of users whose last-7-day impression count equals cap_7d), users_one_below_cap (count with last-7-day count = cap_7d-1), and pct_over_cap_if_one_more (percentage of active users in last 7 days who would exceed the cap if each received one more impression at t0).
(B) For every (user_id, campaign_id) seen in the last 7 days, compute next_eligible_time: the earliest timestamp > t0 when serving an additional impression would not breach the 7-day cap because the oldest in-window impression has rolled out. Return user_id, campaign_id, next_eligible_time.
(C) For 2025-08-25 12:00 <= ts < 2025-09-01 12:00, compute, per campaign, ROI_beyond_cap = (revenue from impressions beyond cap within the window) / (count of impressions beyond cap within the window). Return campaign_id, impressions_beyond_cap, revenue_beyond_cap, ROI_beyond_cap. Edge cases must be handled correctly: users with 0 clicks/conversions, multiple conversions per click, impressions straddling campaign end_date, and users with impressions both before and after the window boundary.
Quick Answer: This question evaluates proficiency with rolling-window analytics, frequency cap logic, time-based attribution, and per-user/per-campaign aggregation for advertising data.