Use PostgreSQL (SQL) and brief Python pseudocode. Assume 'today' is 2025-09-01.
Schema:
-
trips(trip_id BIGINT, request_ts TIMESTAMP, city_id INT, rider_id BIGINT, driver_id BIGINT, shown_eta_sec INT, pickup_eta_sec INT, is_completed BOOLEAN, is_canceled BOOLEAN, surge_multiplier NUMERIC, is_treatment BOOLEAN, experiment_id INT)
-
riders(rider_id BIGINT, signup_dt DATE, city_id INT)
-
city_dim(city_id INT, city_name TEXT, tier TEXT)
-
incentives(rider_id BIGINT, offer_start_dt DATE, offer_end_dt DATE, percent_off INT, cap_usd NUMERIC)
-
day_weather(city_id INT, dt DATE, precip_mm NUMERIC, temp_c NUMERIC)
Small ASCII samples:
trips
trip_id | request_ts | city_id | rider_id | driver_id | shown_eta_sec | pickup_eta_sec | is_completed | is_canceled | surge_multiplier | is_treatment | experiment_id
1 | 2025-08-01 08:01:00 | 1 | 101 | 9001 | 420 | 480 | t | f | 1.0 | t | 42
2 | 2025-08-01 08:03:00 | 1 | 102 | 9002 | 360 | 360 | f | t | 1.2 | f | 42
3 | 2025-08-15 18:30:00 | 2 | 103 | 9003 | 300 | 420 | t | f | 1.1 | f | 42
4 | 2025-08-20 09:00:00 | 1 | 101 | 9001 | 330 | 360 | t | f | 1.0 | t | 42
5 | 2025-08-28 22:10:00 | 2 | 104 | 9004 | 600 | 660 | f | t | 1.5 | t | 42
riders
rider_id | signup_dt | city_id
101 | 2025-05-01 | 1
102 | 2025-01-15 | 1
103 | 2025-06-20 | 2
104 | 2025-03-10 | 2
city_dim
city_id | city_name | tier
1 | Alpha | T1
2 | Beta | T2
incentives
rider_id | offer_start_dt | offer_end_dt | percent_off | cap_usd
101 | 2025-08-15 | 2025-08-31 | 20 | 10
Tasks:
-
SQL: For experiment_id=42, compute for each city_id and calendar date the 7-day rolling median of shown_eta_sec and the request→trip conversion rate (completed/requests) from 2025-08-01 to 2025-09-01. Include days with zero requests (show conversion as NULL) by generating a date spine. Restrict to riders with signup_dt < 2025-07-01. Use percentile_cont(0.5) OVER a 7-day window for the median; clearly handle time zones by truncating request_ts at UTC midnight.
-
SQL: At city level, estimate a difference‑in‑differences conversion uplift between treated (is_treatment=true) and control for post=2025-08-15..2025-09-01 vs pre=2025-08-01..2025-08-14. Output: city_id, pre_treat_conv, pre_ctrl_conv, post_treat_conv, post_ctrl_conv, did_uplift = (post_treat_conv - pre_treat_conv) - (post_ctrl_conv - pre_ctrl_conv). Ensure riders with mixed treatment exposure are counted per their trip‑level exposure.
-
Python (pseudocode ok): Compute CUPED‑adjusted conversion at the rider_id level with X = pre‑period conversion and Y = post‑period conversion; estimate theta = cov(Y, X)/var(X), compute Y_cuped = Y - theta*(X - E[X]). Then estimate adjusted uplift between treated and control, with its standard error via cluster‑robust SEs at rider_id or city_id.