Write SQL for late-delivery metrics by window
Company: DoorDash
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You are given two tables. Assume PostgreSQL. Define delivery duration as delivered_at − pickup_time (exclude rows with null pickup_time or delivered_at). An order is late if delivery duration > INTERVAL '1 hour'. Treat 'completed' as completed orders; exclude other statuses from lateness calculations. For date filters, use the 15 full days prior to 2025-09-01, i.e., 2025-08-17 00:00:00 (inclusive) to 2025-09-01 00:00:00 (exclusive).
Schema:
- orders(order_id INT PRIMARY KEY, request_id INT, customer_id INT, city TEXT, delivery_type TEXT CHECK (delivery_type IN ('food','parcel')),
placed_at TIMESTAMP, pickup_time TIMESTAMP, delivered_at TIMESTAMP, status TEXT, distance_km NUMERIC)
- requests(request_id INT PRIMARY KEY, created_at TIMESTAMP, promised_eta_min INT, priority TEXT, merchant_id INT, driver_id INT)
Small sample (ASCII):
orders
order_id | request_id | customer_id | city | delivery_type | placed_at | pickup_time | delivered_at | status | distance_km
1 | 101 | 7001 | SF | food | 2025-08-17 12:00:00 | 2025-08-17 12:10:00 | 2025-08-17 13:25:00 | completed | 7.5
2 | 102 | 7002 | SF | parcel | 2025-08-17 11:40:00 | 2025-08-17 11:55:00 | 2025-08-17 12:35:00 | completed | 5.2
3 | 103 | 7003 | NYC | food | 2025-08-20 18:10:00 | 2025-08-20 18:45:00 | 2025-08-20 19:30:00 | completed | 3.1
4 | 104 | 7004 | NYC | parcel | 2025-08-22 09:05:00 | 2025-08-22 09:20:00 | 2025-08-22 10:35:00 | completed | 10.0
5 | 105 | 7005 | SEA | food | 2025-08-24 20:00:00 | 2025-08-24 20:20:00 | 2025-08-24 21:40:00 | completed | 12.0
6 | 106 | 7006 | SEA | parcel | 2025-08-26 14:00:00 | 2025-08-26 14:10:00 | 2025-08-26 14:55:00 | cancelled | 2.0
7 | 107 | 7007 | SF | food | 2025-08-28 17:30:00 | 2025-08-28 17:50:00 | 2025-08-28 18:35:00 | completed | 4.0
8 | 108 | 7008 | NYC | parcel | 2025-08-30 11:10:00 | 2025-08-30 11:25:00 | 2025-08-30 12:40:00 | completed | 8.0
requests
request_id | created_at | promised_eta_min | priority | merchant_id | driver_id
101 | 2025-08-17 12:00:00 | 45 | normal | 5001 | 9001
102 | 2025-08-17 11:50:00 | 60 | expedite | 5002 | 9002
103 | 2025-08-20 18:00:00 | 30 | normal | 5003 | 9003
104 | 2025-08-22 09:00:00 | 60 | normal | 5004 | 9004
105 | 2025-08-24 19:50:00 | 45 | normal | 5005 | 9005
106 | 2025-08-26 13:55:00 | 40 | normal | 5006 | 9006
107 | 2025-08-28 17:20:00 | 30 | normal | 5007 | 9007
108 | 2025-08-30 11:00:00 | 45 | expedite | 5008 | 9008
Write SQL for each:
A) Overall lateness proportion: Among completed orders placed within 2025-08-17 to 2025-08-31 (inclusive of dates, based on placed_at), compute the proportion of completed orders with delivered_at − pickup_time > 1 hour. Return late_orders, total_completed, late_rate.
B) By delivery_type: In the same window, compute late_rate by delivery_type and city, returning city, delivery_type, late_orders, total_completed, late_rate. Order by late_rate desc; break ties by late_orders desc.
C) ETA miscalibration: Join requests to orders and, for the same window, compute the share of completed orders where promised_eta_min ≤ 45 but actual duration > 60 minutes. Return overall share and the shares split by delivery_type. Ensure cancelled orders are excluded, and exclude rows with null pickup_time or delivered_at. Also guard against duplicate joins by enforcing one-to-one via primary keys.
Quick Answer: This question evaluates proficiency in SQL-based data manipulation and analytics, including interval arithmetic and null handling, joins between orders and requests, time-window filtering, aggregation and rate calculations, and conditional filtering by order status.