Assume today is 2025-09-01. Use the following schema and sample data to answer the questions with both SQL (preferred) and equivalent Python (pandas) pseudocode. Treat timestamps as UTC.
Schema
-
events(event_time TIMESTAMP, user_id INT, event_type ENUM(impression, click, add_to_cart, purchase), city STRING, session_id STRING)
-
orders(order_id INT, user_id INT, created_at TIMESTAMP, status ENUM(placed, cancelled, completed), status_updated_at TIMESTAMP, city STRING, supplier_id INT)
-
inventory(supplier_id INT, date DATE, units_available INT, city STRING)
Sample tables (tiny, illustrative)
events
event_time user_id event_type city session_id
2025-08-26 08:01:00 101 impression Austin s1
2025-08-26 08:02:10 101 click Austin s1
2025-08-26 09:15:00 102 impression Austin s2
2025-08-27 10:00:05 103 impression Boston s3
2025-08-27 10:05:10 103 purchase Boston s3
2025-08-28 12:00:00 104 impression Austin s4
2025-08-28 12:03:00 104 add_to_cart Austin s4
2025-08-29 07:45:00 105 impression Austin s5
2025-08-30 14:20:00 101 purchase Austin s6
2025-09-01 09:00:00 106 impression Boston s7
orders
order_id user_id created_at status status_updated_at city supplier_id
5001 103 2025-08-27 10:04:59 completed 2025-08-27 10:05:20 Boston 900
5002 101 2025-08-30 14:19:50 placed 2025-08-30 14:19:50 Austin 901
5002 101 2025-08-30 14:19:50 cancelled 2025-08-30 14:25:00 Austin 901
5003 104 2025-08-28 12:02:30 placed 2025-08-28 12:02:30 Austin 902
5003 104 2025-08-28 12:02:30 completed 2025-08-28 12:10:00 Austin 902
5004 105 2025-08-29 07:50:00 placed 2025-08-29 07:50:00 Austin 901
5004 105 2025-08-29 07:50:00 cancelled 2025-08-29 08:05:00 Austin 901
inventory
supplier_id date units_available city
900 2025-08-27 5 Boston
901 2025-08-29 1 Austin
901 2025-08-30 3 Austin
902 2025-08-28 2 Austin
Tasks
-
City-day funnel: For each city and each date between 2025-08-26 and 2025-09-01 inclusive, compute: (i) unique users with at least one impression; (ii) completed orders (using the latest status per order_id as of its max status_updated_at); (iii) conversion rate = completed_orders / unique_impression_users; and (iv) a 7-day rolling conversion rate per city (window ending on that date).
-
Supplier-day fill rate: For each supplier and date in the same window, compute demand = count of unique order_id whose created_at falls on that date and whose final status (latest by status_updated_at) is not cancelled. Then compute fill_rate = LEAST(demand, units_available) / NULLIF(demand, 0). Return supplier_id, date, demand, units_available, fill_rate; default units_available to 0 when missing.
-
Late completions: Ensure that if an order is created on 2025-08-28 but completes on 2025-08-29, it contributes to demand on 2025-08-28 and to completed_orders on the date of completion only if you are computing completion-by-day; explain which definition you used for Task 1 and why.
-
Provide SQL that is robust to duplicated rows in orders (multiple status lines per order_id) by selecting only the latest status per order_id via window functions. Then sketch equivalent pandas code, including how you would compute the rolling metric and handle division-by-zero safely.