Analyze DoorDash Orders: High-Frequency Customers, Top Spenders, MoM Sales & Bottom-Percentile Reach
Company: DoorDash
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
orders
+-------------+-------------+---------------+---------------------+
| delivery_id | customer_id | restaurant_id | order_place_time |
+-------------+-------------+---------------+---------------------+
| 1001 | 1 | 1 | 2024-01-01 09:00:00 |
| 1002 | 1 | 2 | 2024-01-02 09:01:00 |
| ... | ... | ... | ... |
+-------------+-------------+---------------+---------------------+
order_value
+-------------+--------------+
| delivery_id | order_amount |
+-------------+--------------+
| 1001 | 5.00 |
| 1002 | 5.00 |
| ... | ... |
+-------------+--------------+
##### Scenario
You are a data scientist on the DoorDash marketplace team. You have an `orders` table (one row per delivery) and an `order_value` table with each delivery's amount. Work through the four parts below.
##### Question
1. **High-frequency orders by month.** A high-frequency customer places more than 30 orders in a calendar month. For each month, find the percentage of that month's orders placed by high-frequency customers.
2. **Top customer per month.** Excluding the high-frequency customers above, find the top customer for each month by total spend. If there is a tie, return every tied customer.
3. **Restaurant month-over-month sales.** Using `order_value` joined to `orders` (`restaurant_id`), compute the month-over-month change in total sales for `restaurant_id = 5`. Follow-up: generalize to every restaurant.
4. **Bottom-30% restaurant reach (read & rewrite).** Given a query that returns the percentage of customers who order from bottom-30%-by-sales restaurants, explain what it does, then rewrite it to compute that percentage per month.
##### Hints
Think DATE_TRUNC for monthly buckets, COUNT per (month, customer) for the >30 filter, RANK() for ties, LAG() for month-over-month, and PERCENT_RANK() for the bottom-30% percentile.
Quick Answer: A four-part DoorDash data-scientist SQL screen on an orders / order_value schema: the monthly share of high-frequency (>30 orders/month) orders, the top non-high-frequency customer per month (with ties), month-over-month sales change for a restaurant, and the per-month customer reach of bottom-30%-by-sales restaurants — with full solutions.