Design promo experiment and explain correlation
Company: Uber
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: Hard
Interview Round: Technical Screen
You work on a ride-hailing marketplace (drivers + riders). Answer the following analytics and experimentation questions.
## 1) Interpret a surprising correlation
You observe in data that **driver time-to-arrive at the rider pickup location** is **positively correlated** with **rider order acceptance / completion rate** (i.e., longer arrival time is associated with higher acceptance).
### Tasks
- Clarify and define the exact time interval (e.g., request → driver assignment, assignment → driver arrival, driver arrival → pickup, etc.).
- Provide multiple plausible explanations (including confounding and selection effects) for why this positive correlation could appear.
- Propose analyses (or an experiment) to determine whether *longer arrival time causes higher acceptance*, or whether the relationship is spurious.
## 2) Design an experiment for a threshold discount
A new promotion is proposed:
- If a trip’s **pre-discount fare ≥ T** (a threshold), the rider receives **20% off** that trip.
### Tasks
1. Design an experiment to estimate the causal impact of the promotion.
- Specify unit of randomization (rider, request, city/time), eligibility rules, duration, and how to handle interference in a two-sided marketplace.
2. Choose metrics:
- Primary success metric for the **platform**.
- Diagnostic/secondary metrics for **riders**, **drivers**, and marketplace health.
- Guardrail metrics (fraud, cancellations, ETAs, etc.).
3. A teammate suggests analyzing effect by grouping trips by realized fare buckets (e.g., $10–$20, $20–$30) and comparing treatment vs control within each bucket.
- Is this valid? Why/why not?
- If not valid, propose better approaches (e.g., alternative stratification, causal methods, or quasi-experimental designs).
State any assumptions you make.
Quick Answer: This question evaluates causal inference, observational data interpretation, and randomized experiment design skills in a two-sided marketplace context, focusing on confounding, selection effects, interference, and metric definition.