Design an ETA experiment under interference
Company: Uber
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Technical Screen
A new rider ETA model may change perceived wait times and cancellations in a two‑sided marketplace where drivers roam across zones (interference likely). Design an experiment to estimate the causal impact on rider experience and marketplace health.
Requirements:
- Define up to 3 primary success metrics (demand) and up to 5 guardrails (supply/quality). Give exact formulas and units; e.g., request→trip conversion = completed_requests / total_requests; driver earnings/hour; post‑dispatch cancellation within 5 minutes; pickup ETA p50.
- Choose the randomization unit among rider, trip, driver, or zone (geohash‑6) × hour. Justify the choice with respect to SUTVA/interference, and specify blocking/stratification (city tier × weekday/weekend).
- If unit randomization is infeasible, propose a stepped‑wedge rollout with a 10% geo holdout. Describe contamination controls (driver shadow pools, sticky assignment, exclusion buffers) and how you’ll measure and cap cross‑arm exposure.
- Define instrumentation: an exposure variable that confirms the UI actually showed the new ETA. Specify an as‑treated estimator robust to partial compliance and how you’ll bound bias from post‑randomization selection.
- Compare classic A/B vs synthetic control vs diff‑in‑diff: when each is valid here; what pre‑trend diagnostics you’ll run; how to build city‑level synthetic controls (donor pool, regularization, covariates).
- Powering: Compute required sample size to detect a 1.5% relative drop in cancellations with baseline 12%, power 80%, alpha 0.05, ICC=0.02, average 1,500 trips/day/zone over 28 days. Show the clustered variance formula, the design effect, and the minimum number of zones per arm.
- List at least four biases (e.g., demand shocks, surge, time‑of‑day heterogeneity, learning/fatigue) and a mitigation for each (blocking, CUPED with pre‑period cancels, calendar alignment, heterogeneity pre‑spec).
- Monitoring: propose sequential harm boundaries (e.g., alpha‑spending) and a rollout rule using a cost‑benefit model that includes driver earnings impacts and SLA guardrails.
- Deliverables: a pre‑registration outline (estimands, metrics, exclusions, missing‑data rules, interim looks) and a plan to report heterogeneity by city tier and weather without p‑hacking.
Quick Answer: This question evaluates a data scientist's competency in experimental design and causal inference for two-sided marketplaces, focusing on interference, partial compliance, metric definition, randomization strategy, instrumentation, estimation, and power calculations within the Analytics & Experimentation domain.