Design an Uber A/B experiment end-to-end
Company: Uber
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Technical Screen
Uber is considering a redesign of the pickup ETA card shown to riders after they request a trip. The hypothesis is that clearer ETA presentation reduces cancellations and increases completed trips. Design the experiment end-to-end.
Provide:
- Experimental unit and randomization: Choose rider-level, request-level, or geo/time cluster; justify and discuss interference risks (e.g., supply-side coupling, surge, driver behavior) and how you would mitigate them.
- Target population and eligibility: Precisely define who is included/excluded (e.g., first-time vs repeat riders, specific cities, iOS/Android versions), and how to handle riders with multiple requests in the window.
- Primary metric and guardrails: Pick a single primary metric; propose 2–3 guardrails (e.g., driver acceptance rate, average wait time, surge incidence). Define each precisely, including numerator/denominator and window.
- Instrumentation plan: List exact events/fields to log (e.g., request_id, rider_id, device_id, city_id, treatment_arm, event_name, timestamp_ms, cancel_reason, trip_started, trip_completed, driver_supply_at_request, ETA_shown). Include sampling rate, idempotency, and assignment audit (SRM checks).
- Sample size and duration: Assume baseline booking-to-trip-completion rate p0 = 0.120, minimum detectable relative lift = +5%, two-sided α = 0.05, power = 0.80, 1:1 split, and 200,000 eligible requests per day (stable). Compute the per-arm sample size for a difference-in-proportions test and the expected experiment duration (days) to reach it, accounting for data loss of 3%.
- Ramp strategy: Propose a safe ramp (e.g., 1%→10%→50%→100%), pre-specified stopping rules, and monitoring (including SRM and guardrails) at each ramp.
- Bias/variance controls: Describe how you would handle seasonality, city heterogeneity (e.g., stratification or CUPED with pre-period completion), and bot/abuse filtering.
- Success criteria and rollout: Pre-register decision thresholds and what to do if primary improves but a guardrail degrades.
State any additional assumptions you need and show the sample size math clearly.
Quick Answer: This question evaluates experimental design and causal inference skills, covering A/B testing, randomization and interference considerations, metric selection and guardrails, instrumentation and logging, sample size calculation, ramping and bias/variance controls in the Analytics & Experimentation domain.