Compute rolling cold-delivery rates with windows
Company: DoorDash
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
Assume a food-delivery platform with the following schema. Use PostgreSQL. A delivery is considered "cold" if food_temp_c < 40 at dropoff OR there is a complaint of type 'cold_food' filed within 2 hours after dropoff. Exclude orders with NULL dropoff_ts. Assume today = 2025-09-01 (so "last 7 days" means 2025-08-26 to 2025-09-01 inclusive). Schema and tiny samples:
orders(order_id INT, customer_id INT, restaurant_id INT, order_ts TIMESTAMP)
+----------+-------------+---------------+---------------------+
| order_id | customer_id | restaurant_id | order_ts |
+----------+-------------+---------------+---------------------+
| 101 | 1 | 10 | 2025-08-26 11:58:00 |
| 102 | 2 | 11 | 2025-08-27 12:10:00 |
| 103 | 3 | 10 | 2025-08-28 18:40:00 |
| 104 | 1 | 12 | 2025-08-31 20:05:00 |
| 105 | 4 | 11 | 2025-09-01 21:15:00 |
+----------+-------------+---------------+---------------------+
deliveries(order_id INT, courier_id INT, pickup_ts TIMESTAMP, dropoff_ts TIMESTAMP, food_temp_c INT, outside_temp_c INT)
+----------+------------+---------------------+---------------------+-------------+----------------+
| order_id | courier_id | pickup_ts | dropoff_ts | food_temp_c | outside_temp_c |
+----------+------------+---------------------+---------------------+-------------+----------------+
| 101 | 201 | 2025-08-26 12:10:00 | 2025-08-26 12:35:00 | 38 | 31 |
| 102 | 202 | 2025-08-27 12:22:00 | 2025-08-27 12:45:00 | 52 | 29 |
| 103 | 201 | 2025-08-28 18:55:00 | 2025-08-28 19:40:00 | 39 | 22 |
| 104 | 203 | 2025-08-31 20:20:00 | 2025-08-31 20:45:00 | 44 | 28 |
| 105 | 202 | 2025-09-01 21:25:00 | 2025-09-01 22:30:00 | 35 | 24 |
+----------+------------+---------------------+---------------------+-------------+----------------+
complaints(order_id INT, created_ts TIMESTAMP, type TEXT, source TEXT)
+----------+---------------------+------------+--------+
| order_id | created_ts | type | source |
+----------+---------------------+------------+--------+
| 101 | 2025-08-26 13:05:00 | cold_food | app |
| 103 | 2025-08-28 20:30:00 | cold_food | web |
| 104 | 2025-09-01 09:00:00 | wrong_item | app |
+----------+---------------------+------------+--------+
restaurants(restaurant_id INT, name TEXT, city TEXT)
+---------------+------------+---------+
| restaurant_id | name | city |
+---------------+------------+---------+
| 10 | Noodle Hut | SF |
| 11 | Taco Loco | SF |
| 12 | Curry Dash | Oakland |
+---------------+------------+---------+
Tasks (use window functions where applicable):
1) For each date in the last 7 days, compute cold_rate = cold_deliveries / delivered_orders. Count a delivery as cold if either criterion triggers; deduplicate so an order is counted once even if both triggers occur. Return date, delivered_orders, cold_deliveries, cold_rate rounded to 3 decimals.
2) For each restaurant, compute a 7-day rolling cold_rate ordered by date and return rows for 2025-08-26..2025-09-01 with columns: date, restaurant_id, delivered_orders, cold_deliveries, rolling_cold_rate_7d. Ensure dates with zero volume appear with delivered_orders=0 and rate=NULL.
3) Rank restaurants by 7-day rolling cold_rate on 2025-09-01 using DENSE_RANK(), breaking ties by higher delivered_orders first. Return top 3 with restaurant_id, name, delivered_orders_7d, cold_rate_7d, rank.
4) Flag couriers whose cold rate z-score over the last 7 days exceeds +2 relative to the courier population distribution. Return courier_id, deliveries_7d, cold_rate_7d, z_score. Use windowed AVG() and STDDEV_POP() across couriers.
Edge cases to handle explicitly in SQL: orders missing in deliveries, multiple complaints per order, NULL food_temp_c (rely only on complaint criterion), and overlapping date boundaries at midnight.
Quick Answer: This question evaluates proficiency with SQL window functions, time-based joins, event deduplication and business-rule implementation for computing rolling metrics over temporal windows.