Compute ETA shift and conversion uplift

Q: Compute ETA shift and conversion uplift

This question evaluates proficiency in SQL and Python for time‑series data manipulation and experimental analysis, covering window functions (7‑day percentile_cont medians), date spines and timezone‑aware timestamp truncation, conversion-rate aggregation, difference‑in‑differences uplift estimation, and CUPED adjustment with cluster‑robust standard errors. It is commonly asked to assess practical data engineering and applied statistics skills in the Data Manipulation (SQL/Python) domain, testing applied implementation and causal inference ability rather than only conceptual understanding.

Q: How do I practice SQL interview questions?

PracHub provides an interactive SQL console where you can write and test queries against real database schemas. Get instant feedback and compare your solution with the expected output.

Q: What difficulty level is this coding question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Technical Screen rounds at Uber.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Uber during technical interviews.

Question

Use PostgreSQL (SQL) and brief Python pseudocode. Assume 'today' is 2025-09-01.

Schema:
- trips(trip_id BIGINT, request_ts TIMESTAMP, city_id INT, rider_id BIGINT, driver_id BIGINT, shown_eta_sec INT, pickup_eta_sec INT, is_completed BOOLEAN, is_canceled BOOLEAN, surge_multiplier NUMERIC, is_treatment BOOLEAN, experiment_id INT)
- riders(rider_id BIGINT, signup_dt DATE, city_id INT)
- city_dim(city_id INT, city_name TEXT, tier TEXT)
- incentives(rider_id BIGINT, offer_start_dt DATE, offer_end_dt DATE, percent_off INT, cap_usd NUMERIC)
- day_weather(city_id INT, dt DATE, precip_mm NUMERIC, temp_c NUMERIC)

Small ASCII samples:
trips
trip_id | request_ts | city_id | rider_id | driver_id | shown_eta_sec | pickup_eta_sec | is_completed | is_canceled | surge_multiplier | is_treatment | experiment_id
1 | 2025-08-01 08:01:00 | 1 | 101 | 9001 | 420 | 480 | t | f | 1.0 | t | 42
2 | 2025-08-01 08:03:00 | 1 | 102 | 9002 | 360 | 360 | f | t | 1.2 | f | 42
3 | 2025-08-15 18:30:00 | 2 | 103 | 9003 | 300 | 420 | t | f | 1.1 | f | 42
4 | 2025-08-20 09:00:00 | 1 | 101 | 9001 | 330 | 360 | t | f | 1.0 | t | 42
5 | 2025-08-28 22:10:00 | 2 | 104 | 9004 | 600 | 660 | f | t | 1.5 | t | 42

riders
rider_id | signup_dt | city_id
101 | 2025-05-01 | 1
102 | 2025-01-15 | 1
103 | 2025-06-20 | 2
104 | 2025-03-10 | 2

incentives
rider_id | offer_start_dt | offer_end_dt | percent_off | cap_usd
101 | 2025-08-15 | 2025-08-31 | 20 | 10

Tasks:
1) SQL: For experiment_id=42, compute for each city_id and calendar date the 7-day rolling median of shown_eta_sec and the request→trip conversion rate (completed/requests) from 2025-08-01 to 2025-09-01. Include days with zero requests (show conversion as NULL) by generating a date spine. Restrict to riders with signup_dt < 2025-07-01. Use percentile_cont(0.5) OVER a 7-day window for the median; clearly handle time zones by truncating request_ts at UTC midnight.
2) SQL: At city level, estimate a difference‑in‑differences conversion uplift between treated (is_treatment=true) and control for post=2025-08-15..2025-09-01 vs pre=2025-08-01..2025-08-14. Output: city_id, pre_treat_conv, pre_ctrl_conv, post_treat_conv, post_ctrl_conv, did_uplift = (post_treat_conv - pre_treat_conv) - (post_ctrl_conv - pre_ctrl_conv). Ensure riders with mixed treatment exposure are counted per their trip‑level exposure.
3) Python (pseudocode ok): Compute CUPED‑adjusted conversion at the rider_id level with X = pre‑period conversion and Y = post‑period conversion; estimate theta = cov(Y, X)/var(X), compute Y_cuped = Y - theta*(X - E[X]). Then estimate adjusted uplift between treated and control, with its standard error via cluster‑robust SEs at rider_id or city_id.

PracHub · Accepted Answer

This question evaluates proficiency in SQL and Python for time‑series data manipulation and experimental analysis, covering window functions (7‑day percentile_cont medians), date spines and timezone‑aware timestamp truncation, conversion-rate aggregation, difference‑in‑differences uplift estimation, and CUPED adjustment with cluster‑robust standard errors. It is commonly asked to assess practical data engineering and applied statistics skills in the Data Manipulation (SQL/Python) domain, testing applied implementation and causal inference ability rather than only conceptual understanding.

Quick Overview

Quick Overview