Write SQL to quantify outage revenue loss
Company: Capital One
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
A database outage blocked premium membership registrations from 2025-01-01 to 2025-03-31 (inclusive). Members intend to start on intended_start_date, are billed monthly in arrears at $10/month only after their registration is recorded (registered_date). Some users registered after the outage; some are still missing. For each affected user, the lost revenue equals $10 for each calendar month within the outage window during which the user was not yet registered by that month’s end. Write a single PostgreSQL query that returns: (1) total dollars lost, and (2) a breakdown of users by months_lost ∈ {1,2,3} with counts. Rules: use the earliest registered_date per user; count months precisely using calendar boundaries; include users with NULL registered_date; ignore users whose intended_start_date is after 2025-03-31; deduplicate inputs. Use the following schema and sample ASCII tables:
Tables:
- premium_intended_enrollments(user_id TEXT, intended_start_date DATE)
- premium_actual_registrations(user_id TEXT, registered_date DATE) -- earliest per user after dedup
- outage_window(start_date DATE, end_date DATE) -- here one row: (2025-01-01, 2025-03-31)
Sample data:
premium_intended_enrollments
+---------+---------------------+
| user_id | intended_start_date |
+---------+---------------------+
| U2 | 2025-01-05 |
| U3 | 2025-01-20 |
| U4 | 2025-02-03 |
| U5 | 2025-03-25 |
| U6 | 2025-03-30 |
| U8 | 2025-01-10 |
| U9 | 2025-02-20 |
| U10 | 2025-03-15 |
+---------+---------------------+
premium_actual_registrations
+---------+-----------------+
| user_id | registered_date |
+---------+-----------------+
| U2 | 2025-04-05 |
| U3 | 2025-03-10 |
| U4 | 2025-04-02 |
| U5 | 2025-04-02 |
| U6 | 2025-06-01 |
| U8 | NULL |
| U9 | NULL |
| U10 | NULL |
+---------+-----------------+
outage_window
+------------+------------+
| start_date | end_date |
+------------+------------+
| 2025-01-01 | 2025-03-31 |
+------------+------------+
Output requirements:
- Return columns: months_lost, user_count, dollars_lost (user_count×months_lost×10) and a final rollup row with totals; and separately a single scalar total dollars lost.
- Treat a month as lost if registered_date IS NULL or registered_date > last_day_of_month; only months within [start_date, end_date] count; if intended_start_date begins mid‑window (e.g., 2025‑03‑25), only count months from that month onward within the window.
- Use generate_series to enumerate calendar months; ensure it works on production‑scale tables (hint: pre‑aggregate earliest registration per user, join to a month spine, and filter).
Quick Answer: This question evaluates proficiency in SQL-based data manipulation, specifically date arithmetic, calendar-bound aggregations, deduplication, NULL handling, and calculating business metrics for revenue impact.