Measure Late Deliveries and Identify Top Delayed Restaurants
Company: DoorDash
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
orders
+----------+---------+--------------+---------------------+-------------------------+-----------------------+
| order_id | user_id | restaurant_id| created_at | estimated_delivery_at | actual_delivery_at |
+----------+---------+--------------+---------------------+-------------------------+-----------------------+
| 1 | 101 | 15 | 2023-07-10 12:00 | 2023-07-10 12:30 | 2023-07-10 12:40 |
| 2 | 102 | 17 | 2023-07-10 13:10 | 2023-07-10 13:45 | 2023-07-10 13:43 |
| 3 | 103 | 15 | 2023-07-11 10:05 | 2023-07-11 10:35 | 2023-07-11 11:00 |
| 4 | 104 | 18 | 2023-07-12 09:00 | 2023-07-12 09:25 | 2023-07-12 09:20 |
+----------+---------+--------------+---------------------+-------------------------+-----------------------+
##### Scenario
An on-demand food-delivery company wants to measure and monitor late deliveries.
##### Question
Write a SQL query that, for the last 7 days, returns each day’s total orders and the percentage that were delivered more than 10 minutes after estimated_delivery_at. Extend it to list the top 5 restaurants with the highest average delivery delay in that period.
##### Hints
Use DATE_TRUNC / DATE() for grouping, TIMESTAMPDIFF or equivalent to compute delay, and ORDER BY with LIMIT for ranking.
Quick Answer: This question evaluates proficiency in time-based aggregations, timestamp arithmetic, percentage calculations, and ranking/grouping using SQL or equivalent Python data-manipulation libraries.