Write SQL for deliveries analytics
Company: SIG (Susquehanna)
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Write SQL for the following analytics tasks. Assume a PostgreSQL-like dialect unless otherwise stated. Treat "today" as 2025-09-01.
Schema:
- users(user_id INT, signup_date DATE, city TEXT)
- orders(order_id INT, user_id INT, created_at TIMESTAMP, delivered_at TIMESTAMP NULL, courier_id INT, distance_km NUMERIC(5,1), status TEXT CHECK (status IN ('created','in_progress','delivered','cancelled')), courier_type TEXT CHECK (courier_type IN ('biker','car')))
- couriers(courier_id INT, type TEXT CHECK (type IN ('biker','car')), start_date DATE, city TEXT)
- calendar_months(month_start DATE, month_label TEXT) -- month_start is the first day of the month
Sample data (minimal, for understanding only):
users
+---------+-------------+-----+
| user_id | signup_date | city|
+---------+-------------+-----+
| 1 | 2025-08-20 | SF |
| 2 | 2025-08-25 | SF |
| 3 | 2025-08-28 | NYC |
| 4 | 2025-07-15 | NYC |
| 5 | 2025-07-01 | SF |
+---------+-------------+-----+
orders
+----------+---------+---------------------+---------------------+------------+-------------+------------+--------------+
| order_id | user_id | created_at | delivered_at | courier_id | distance_km | status | courier_type |
+----------+---------+---------------------+---------------------+------------+-------------+------------+--------------+
| 101 | 1 | 2025-08-26 10:05:00 | 2025-08-26 10:42:00 | 9001 | 3.2 | delivered | biker |
| 102 | 1 | 2025-08-27 12:10:00 | NULL | 9002 | 4.5 | cancelled | car |
| 103 | 2 | 2025-08-30 18:00:00 | 2025-08-30 18:35:00 | 9003 | 2.0 | delivered | biker |
| 104 | 3 | 2025-08-31 09:15:00 | 2025-08-31 09:50:00 | 9004 | 5.1 | delivered | car |
| 105 | 3 | 2025-09-01 11:00:00 | NULL | 9004 | 1.2 | in_progress| biker |
| 106 | 4 | 2025-07-20 14:00:00 | 2025-07-20 14:50:00 | 9004 | 6.5 | delivered | car |
| 107 | 5 | 2025-07-01 08:30:00 | NULL | 9001 | 1.0 | cancelled | biker |
+----------+---------+---------------------+---------------------+------------+-------------+------------+--------------+
couriers
+------------+-------+------------+-----+
| courier_id | type | start_date | city|
+------------+-------+------------+-----+
| 9001 | biker | 2025-07-01 | SF |
| 9002 | car | 2025-06-10 | SF |
| 9003 | biker | 2025-08-15 | SF |
| 9004 | car | 2025-07-20 | NYC |
+------------+-------+------------+-----+
calendar_months
+-------------+------------+
| month_start | month_label|
+-------------+------------+
| 2025-07-01 | 2025-07 |
| 2025-08-01 | 2025-08 |
| 2025-09-01 | 2025-09 |
+-------------+------------+
Tasks:
A) Last 7 days conversion (2025-08-26 to 2025-09-01 inclusive): For each city and courier_type, compute conversion = delivered orders / created orders where created_at is in the window. Count an order as delivered if status = 'delivered' and delivered_at is not NULL (regardless of when delivered). Return city, courier_type, created_orders, delivered_orders, conversion.
B) Monthly volume with zero months: For the last 3 months (2025-07 to 2025-09), output month_label, courier_type, orders_created, orders_delivered, and GMV assuming GMV = COUNT(delivered orders) * 10. Include months with zero orders by left joining from calendar_months. Use created_at for month assignment.
C) Consecutive biker streaks: For each user, find the first occurrence of a streak of 3 consecutive delivered orders fulfilled by bikers (ignore non-delivered orders; any non-biker delivery breaks the streak). Output user_id, first_order_id, third_order_id, and minutes between their created_at timestamps. If no streak, exclude the user.
D) p95 delivery time: For orders created in the last 30 days (2025-08-03 to 2025-09-01) with status = 'delivered', compute the 95th percentile of delivery time in minutes by courier_type using an appropriate percentile function. Also return count per group.
E) Follow-up to B (city x month completeness): Break the monthly report down by city and courier_type so that every (city, month_label) combination present in users.city ∪ couriers.city appears, even if there were zero orders. Return zeros when no orders exist.
F) Aging undelivered: Find orders with created_at <= '2025-09-01 00:00:00' that were not delivered within 2 hours of creation (i.e., delivered_at is NULL or delivered_at > created_at + interval '2 hours'). Return order_id, user_id, courier_type, status, created_at, delivered_at, and a boolean is_stale_2h.
Provide performant SQL for each task, explaining any window functions or joins you rely on in comments.
Quick Answer: This question evaluates SQL data manipulation and analytics skills—specifically aggregation, conditional counting, time-window filtering, joins, and handling of nulls and zero-count periods.