Solve multi-part SQL with sliding windows
Company: DoorDash
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Assume 'today' is 2025-09-01. You are given the following tables.
users(user_id INT PRIMARY KEY, signup_date DATE)
orders(order_id INT PRIMARY KEY, user_id INT, order_date DATE, status TEXT)
order_items(order_id INT, product_id INT, qty INT, price DECIMAL(10,2))
products(product_id INT PRIMARY KEY, category TEXT)
events(user_id INT, event_date DATE, platform TEXT, event_type TEXT)
Sample rows
users
+---------+-------------+
| user_id | signup_date |
+---------+-------------+
| 1 | 2025-07-15 |
| 2 | 2025-08-20 |
| 3 | 2025-08-28 |
| 4 | 2025-08-30 |
events
+---------+------------+----------+------------+
| user_id | event_date | platform | event_type |
+---------+------------+----------+------------+
| 1 | 2025-08-26 | ios | login |
| 1 | 2025-08-27 | ios | purchase |
| 1 | 2025-08-28 | ios | login |
| 2 | 2025-08-26 | web | login |
| 2 | 2025-08-29 | web | login |
| 3 | 2025-08-30 | android | login |
| 3 | 2025-08-31 | android | login |
| 4 | 2025-09-01 | web | login |
orders
+----------+---------+------------+----------+
| order_id | user_id | order_date | status |
+----------+---------+------------+----------+
| 101 | 1 | 2025-08-27 | paid |
| 102 | 2 | 2025-08-29 | cancelled|
| 103 | 3 | 2025-08-31 | shipped |
| 104 | 3 | 2025-09-01 | paid |
order_items
+----------+------------+-----+--------+
| order_id | product_id | qty | price |
+----------+------------+-----+--------+
| 101 | 11 | 1 | 50.00 |
| 101 | 12 | 2 | 20.00 |
| 102 | 11 | 1 | 50.00 |
| 103 | 12 | 1 | 20.00 |
| 103 | 13 | 3 | 15.00 |
| 104 | 13 | 2 | 15.00 |
products
+------------+----------+
| product_id | category |
+------------+----------+
| 11 | gadgets |
| 12 | gadgets |
| 13 | books |
Tasks (write ANSI SQL; window functions encouraged; be careful with distinct counts and time windows):
1) 7-day rolling DAU by platform: For each calendar day d between 2025-08-20 and 2025-09-01 inclusive, output platform, d, and the number of distinct users who had any event on days in [d-6, d]. Treat multiple events per user-day as one. Include missing days with zeroes where a platform has no activity.
2) Top-3 products by revenue per category in the last 30 days relative to today (i.e., order_date in [2025-08-02, 2025-09-01]). Revenue is SUM(qty*price) from order_items for orders with status NOT IN ('cancelled'). Break ties by product_id ascending. Output category, product_id, revenue, rank (1..3). Exclude categories with fewer than 1 eligible product from the output.
3) Current login streak: For each user, compute the length (in days) of the current consecutive login streak ending at their most recent login on or before 2025-09-01 (count a day if there is at least one 'login' event on that date). Output user_id and current_streak_days.
4) 1st→2nd purchase uplift: Consider non-cancelled orders only (status IN ('paid','shipped')). For users whose 2nd order occurs within 90 days of signup_date and whose 2nd order gross amount >= 1.5× the 1st order gross amount, output user_id, first_order_id, first_amount, second_order_id, second_amount, and days_between. If a user has multiple orders on the same day, define order sequence by order_date then order_id.
Quick Answer: This question evaluates a candidate's practical SQL data-manipulation skills, including time-based rolling aggregates, distinct counts and deduplication, multi-table joins, revenue aggregation and ranking, and session/streak computations within the Data Manipulation (SQL/Python) domain.