Using BigQuery/Snowflake-style SQL (CTEs required; use LAG and QUALIFY), answer the tasks below. Assume 'today' is 2025-09-01. Schema and small samples:
Tables:
-
deliveries(delivery_id STRING, order_id STRING, courier_id STRING, restaurant_id STRING, city STRING, pickup_ts TIMESTAMP, dropoff_ts TIMESTAMP, distance_km FLOAT, status STRING)
-
complaints(complaint_id STRING, order_id STRING, complaint_ts TIMESTAMP, type STRING)
-
order_events(order_id STRING, event_ts TIMESTAMP, status STRING) -- status in ('ready','picked_up','delivered')
-
couriers(courier_id STRING, city STRING, has_insulated_bag BOOLEAN, activation_date DATE)
Samples (ASCII):
-
deliveries
delivery_id | order_id | courier_id | restaurant_id | city | pickup_ts | dropoff_ts | distance_km | status
D1 | 1001 | C1 | R1 | SF | 2025-09-01 12:05:00 | 2025-09-01 12:25:00 | 3.2 | completed
D2 | 1002 | C2 | R2 | SF | 2025-09-01 12:10:00 | 2025-09-01 12:55:00 | 7.8 | completed
D3 | 1003 | C1 | R3 | NYC | 2025-08-31 19:40:00 | 2025-08-31 20:20:00 | 5.1 | completed
D4 | 1004 | C3 | R4 | NYC | 2025-08-31 19:50:00 | 2025-08-31 20:10:00 | 2.0 | completed
D5 | 1005 | C2 | R2 | SF | 2025-08-31 12:00:00 | 2025-08-31 12:40:00 | 6.3 | completed
-
complaints
complaint_id | order_id | complaint_ts | type
A1 | 1002 | 2025-09-01 13:10:00 | cold
A2 | 1003 | 2025-08-31 21:00:00 | cold
A3 | 1004 | 2025-08-31 20:45:00 | late
-
order_events
order_id | event_ts | status
1001 | 2025-09-01 12:00:00 | ready
1001 | 2025-09-01 12:05:00 | picked_up
1001 | 2025-09-01 12:25:00 | delivered
1002 | 2025-09-01 12:35:00 | ready
1002 | 2025-09-01 12:40:00 | picked_up
1002 | 2025-09-01 12:55:00 | delivered
1003 | 2025-08-31 19:20:00 | ready
1003 | 2025-08-31 19:40:00 | picked_up
1003 | 2025-08-31 20:20:00 | delivered
1004 | 2025-08-31 19:30:00 | ready
1004 | 2025-08-31 19:50:00 | picked_up
1004 | 2025-08-31 20:10:00 | delivered
-
couriers
courier_id | city | has_insulated_bag | activation_date
C1 | SF | TRUE | 2025-06-01
C2 | SF | FALSE | 2025-07-15
C3 | NYC | TRUE | 2025-03-10
Assumptions:
-
Consider only deliveries.status = 'completed'. Count at most one 'cold' complaint per order, within 24h of dropoff.
Tasks (write one SQL script with CTEs):
A) Compute city-day cold complaint rate for the last 30 days ending 2025-09-01 (complaints per completed delivery). Return city, date, deliveries, cold_complaints, complaint_rate. Ensure time zones are handled by casting dropoff_ts to city-local date; state any assumption you make for timezone.
B) Using order_events and LAG over (PARTITION BY order_id ORDER BY event_ts), compute per order: ready_to_pickup_min and pickup_to_dropoff_min. Then, for 2025-08-01 to 2025-09-01, compare median pickup_to_dropoff_min between orders with a cold complaint vs. without, per city. Keep only city rows where the difference > 12 minutes using QUALIFY on a window over cities.
C) For each courier, consider their last 100 completed deliveries up to 2025-09-01 23:59:59 in their city. Flag couriers with cold complaint rate > 2× the city median in the same period and at least 100 deliveries. Use QUALIFY to (i) pick the last 100 by dropoff_ts per courier and (ii) filter the outliers. Return courier_id, city, deliveries, cold_complaints, rate, city_median_rate.
D) Compute, per city, the P90 of pickup_to_dropoff_min. Then, for each courier, compute the share of their deliveries exceeding their city’s P90, and QUALIFY the top 5 couriers per city by that share (ties broken by higher deliveries). Explain any tie-breaking window you use.