Write SQL to backtest refund policy
Company: DoorDash
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Using the schema and samples below, write a single SQL query (CTEs allowed) that does all of the following for the last 30 days relative to today = 2025-09-01 (i.e., orders placed between 2025-08-02 and 2025-09-01 inclusive): (1) Compute lateness_minutes = GREATEST(0, EXTRACT(EPOCH FROM (del.delivered_at - o.promised_at))/60). (2) Within each (city, delivered_date), compute PERCENT_RANK() of lateness_minutes. (3) For store-days, compute rolling 7-day cold-food refund cost per order using a window (include the current day plus the prior 6 days). (4) Simulate a proposed policy where cold-food refunds are a percentage of order subtotal: lateness <10 min → 0%; 10–29 min → 50%; ≥30 min → 100%; and cap any refund at $50; current policy is 100% of subtotal. (5) Output, by city, the total current refund cost, proposed refund cost, and estimated savings for the period, plus identify the top 5% store-days by current refund cost per order. Use window functions (PARTITION, PERCENT_RANK), a rolling window, subqueries/CTEs, and joins. Assume monetary fields with _cents are integer cents. Provide the final SELECTs producing: (a) city-level aggregates; (b) the list of top 5% store-days with store_id, date, current_cost_per_order, proposed_cost_per_order, savings_per_order.
Schema:
orders(o): order_id INT, customer_id INT, store_id INT, city TEXT, order_placed_at TIMESTAMP, promised_at TIMESTAMP, subtotal_cents INT
deliveries(del): order_id INT, courier_id INT, picked_up_at TIMESTAMP, delivered_at TIMESTAMP, distance_km NUMERIC
refunds(r): refund_id INT, order_id INT, refund_reason TEXT, refund_amount_cents INT, created_at TIMESTAMP
stores(s): store_id INT, city TEXT, cuisine TEXT
couriers(c): courier_id INT, has_thermal_bag BOOLEAN, onboarded_at DATE
Sample rows (minimal, illustrative):
orders
+----------+-------------+----------+------+---------------------+---------------------+----------------+
| order_id | customer_id | store_id | city | order_placed_at | promised_at | subtotal_cents |
+----------+-------------+----------+------+---------------------+---------------------+----------------+
| 1 | 101 | 10 | SF | 2025-08-28 12:00:00 | 2025-08-28 12:40:00 | 3000 |
| 2 | 102 | 10 | SF | 2025-08-28 12:10:00 | 2025-08-28 12:50:00 | 4500 |
| 3 | 103 | 11 | SF | 2025-08-29 19:00:00 | 2025-08-29 19:40:00 | 2000 |
| 4 | 104 | 12 | NYC | 2025-08-29 19:05:00 | 2025-08-29 19:45:00 | 6000 |
| 5 | 105 | 13 | NYC | 2025-08-30 20:00:00 | 2025-08-30 20:35:00 | 2500 |
| 6 | 106 | 12 | NYC | 2025-08-31 18:00:00 | 2025-08-31 18:30:00 | 3500 |
+----------+-------------+----------+------+---------------------+---------------------+----------------+
deliveries
+----------+------------+---------------------+---------------------+-------------+
| order_id | courier_id | picked_up_at | delivered_at | distance_km |
+----------+------------+---------------------+---------------------+-------------+
| 1 | 5001 | 2025-08-28 12:15:00 | 2025-08-28 12:38:00 | 2.1 |
| 2 | 5002 | 2025-08-28 12:30:00 | 2025-08-28 13:35:00 | 7.5 |
| 3 | 5003 | 2025-08-29 19:10:00 | 2025-08-29 20:05:00 | 9.2 |
| 4 | 5004 | 2025-08-29 19:25:00 | 2025-08-29 20:35:00 | 12.3 |
| 5 | 5005 | 2025-08-30 20:05:00 | 2025-08-30 20:50:00 | 3.3 |
| 6 | 5002 | 2025-08-31 18:10:00 | 2025-08-31 19:20:00 | 10.0 |
+----------+------------+---------------------+---------------------+-------------+
refunds
+-----------+----------+-------------+---------------------+--------------------+
| refund_id | order_id | refund_note | created_at | refund_amount_cents|
+-----------+----------+-------------+---------------------+--------------------+
| 9001 | 2 | cold_food | 2025-08-28 13:50:00 | 4500 |
| 9002 | 3 | cold_food | 2025-08-29 20:10:00 | 2000 |
| 9003 | 4 | cold_food | 2025-08-29 20:40:00 | 6000 |
| 9004 | 6 | cold_food | 2025-08-31 19:30:00 | 3500 |
+-----------+----------+-------------+---------------------+--------------------+
stores
+----------+------+---------+
| store_id | city | cuisine |
+----------+------+---------+
| 10 | SF | Burgers |
| 11 | SF | Sushi |
| 12 | NYC | Pizza |
| 13 | NYC | Chinese |
+----------+------+---------+
couriers
+------------+------------------+-------------+
| courier_id | has_thermal_bag | onboarded_at|
+------------+------------------+-------------+
| 5001 | false | 2024-03-01 |
| 5002 | false | 2024-06-15 |
| 5003 | true | 2024-02-10 |
| 5004 | true | 2023-11-20 |
| 5005 | false | 2025-01-05 |
+------------+------------------+-------------+
Notes: Treat missing refunds as zero (no refund). Assume currency is USD; apply the $50 cap to proposed refunds after computing the percentage.
Quick Answer: This question evaluates advanced SQL data-manipulation and analytics competency, including time-based lateness calculations, window functions (PERCENT_RANK and rolling/partitioned windows), CTEs and joins, aggregation, and monetary-field handling for a policy-simulation scenario.