Compute ETA shift and conversion uplift

Q: Compute ETA shift and conversion uplift

This is a Data Manipulation (SQL/Python) interview question from Uber for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Use PostgreSQL (SQL) and brief Python pseudocode. Assume 'today' is 2025-09-01.

Schema:

trips(trip_id BIGINT, request_ts TIMESTAMP, city_id INT, rider_id BIGINT, driver_id BIGINT, shown_eta_sec INT, pickup_eta_sec INT, is_completed BOOLEAN, is_canceled BOOLEAN, surge_multiplier NUMERIC, is_treatment BOOLEAN, experiment_id INT)
riders(rider_id BIGINT, signup_dt DATE, city_id INT)
city_dim(city_id INT, city_name TEXT, tier TEXT)
incentives(rider_id BIGINT, offer_start_dt DATE, offer_end_dt DATE, percent_off INT, cap_usd NUMERIC)
day_weather(city_id INT, dt DATE, precip_mm NUMERIC, temp_c NUMERIC)

Small ASCII samples: trips trip_id | request_ts | city_id | rider_id | driver_id | shown_eta_sec | pickup_eta_sec | is_completed | is_canceled | surge_multiplier | is_treatment | experiment_id 1 | 2025-08-01 08:01:00 | 1 | 101 | 9001 | 420 | 480 | t | f | 1.0 | t | 42 2 | 2025-08-01 08:03:00 | 1 | 102 | 9002 | 360 | 360 | f | t | 1.2 | f | 42 3 | 2025-08-15 18:30:00 | 2 | 103 | 9003 | 300 | 420 | t | f | 1.1 | f | 42 4 | 2025-08-20 09:00:00 | 1 | 101 | 9001 | 330 | 360 | t | f | 1.0 | t | 42 5 | 2025-08-28 22:10:00 | 2 | 104 | 9004 | 600 | 660 | f | t | 1.5 | t | 42

riders rider_id | signup_dt | city_id 101 | 2025-05-01 | 1 102 | 2025-01-15 | 1 103 | 2025-06-20 | 2 104 | 2025-03-10 | 2

incentives rider_id | offer_start_dt | offer_end_dt | percent_off | cap_usd 101 | 2025-08-15 | 2025-08-31 | 20 | 10

Tasks:

SQL: For experiment_id=42, compute for each city_id and calendar date the 7-day rolling median of shown_eta_sec and the request→trip conversion rate (completed/requests) from 2025-08-01 to 2025-09-01. Include days with zero requests (show conversion as NULL) by generating a date spine. Restrict to riders with signup_dt < 2025-07-01. Use percentile_cont(0.5) OVER a 7-day window for the median; clearly handle time zones by truncating request_ts at UTC midnight.
SQL: At city level, estimate a difference‑in‑differences conversion uplift between treated (is_treatment=true) and control for post=2025-08-15..2025-09-01 vs pre=2025-08-01..2025-08-14. Output: city_id, pre_treat_conv, pre_ctrl_conv, post_treat_conv, post_ctrl_conv, did_uplift = (post_treat_conv - pre_treat_conv) - (post_ctrl_conv - pre_ctrl_conv). Ensure riders with mixed treatment exposure are counted per their trip‑level exposure.
Python (pseudocode ok): Compute CUPED‑adjusted conversion at the rider_id level with X = pre‑period conversion and Y = post‑period conversion; estimate theta = cov(Y, X)/var(X), compute Y_cuped = Y - theta*(X - E[X]). Then estimate adjusted uplift between treated and control, with its standard error via cluster‑robust SEs at rider_id or city_id.

Compute ETA shift and conversion uplift

Comments (0)