Evaluate Promotions for Uber Eats Users
Company: Uber
Role: Data Scientist
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
Uber Eats wants to send promotions or coupons to its users (for example, "$5 off your next order" or "20% off, minimum basket $15"). You are the data scientist asked to **design an experiment and analysis plan to evaluate whether the promotion is effective** and to recommend whether the company should launch it.
The core challenge is causal: users who redeem coupons may already order more often, so a naive comparison of coupon users vs. non-users will overstate the effect. Your job is to estimate the **incremental** impact of the promotion and to translate that into a launch decision that accounts for cost, cannibalization, segment differences, and the fact that Uber Eats is a marketplace.
### Constraints & Assumptions
State your assumptions explicitly; reasonable values are fine. For concreteness, assume:
- The promotion is a fixed-value or percentage discount, optionally with a minimum basket size and an expiration window.
- It can be delivered via push, email, in-app banner, or an automatically-applied discount at checkout.
- You can randomize the offer before it is sent, and you have access to historical user-level data (order frequency, gross bookings, city, tenure, prior coupon usage).
- Uber Eats is a multi-sided marketplace: eaters, restaurants, and couriers share finite delivery capacity in each city.
- The business cares about **profit**, not just order volume — every redeemed coupon has a direct cost.
### The Problem
Design the end-to-end experiment and analysis plan. Concretely, address each of the following:
1. **Metrics** — the primary business metric(s), supporting/secondary metrics, and guardrail metrics.
2. **Randomization unit** — what you randomize on, and why.
3. **Treatment and control definition** — how the two arms are constructed and analyzed.
4. **Confounding** — what could bias the estimate, and how you avoid or adjust for it.
5. **Economics & marketplace effects** — how you handle promotion cost, cannibalization, heterogeneous treatment effects, and marketplace interference.
6. **Launch decision** — the criteria under which you would (or would not) roll out.
```hint Watch who self-selects
Before you compare any two groups, ask: *who decides* whether a coupon gets redeemed? If redeemers opt in based on something you can't fully observe, what does a redeemer-vs-non-redeemer comparison actually measure — and what experimental quantity would you have to estimate instead to sidestep that?
```
```hint Stress-test "more orders = success"
Imagine orders go up but the finance team is unhappy. What is the coupon paying for on each order it touches — only the *extra* orders, or every order it lands on? Let that question shape what your primary metric has to subtract before you call the promotion a win.
```
```hint Is one user's treatment isolated from another's?
You'll be tempted to randomize at the most natural unit. But Uber Eats users in a city draw on the same couriers and restaurant kitchens. Ask whether a treated user's behavior can change the experience of a *control* user nearby — and if it can, what assumption does that break, and what alternative unit would restore it?
```
```hint Could the average hide the story?
A single average effect can be a blend of opposite-signed groups. Which kinds of users might a coupon genuinely activate, and which might it simply pay to do what they'd have done anyway? If those pull in different directions, what does that imply for how you analyze the result and for what you'd actually ship?
```
### Clarifying Questions to Ask
- What is the business objective — acquisition, reactivation of dormant users, defense against a competitor, or pure GMV growth? This changes the primary metric.
- What is the exact promotion mechanic (fixed $ vs. %, minimum basket, cap, expiration, one-time vs. recurring)?
- Which users are eligible, and how are they currently targeted?
- What time horizon matters — immediate orders only, or post-promotion retention?
- What is the budget / acceptable cost-per-incremental-order, and is there a hard guardrail on margin?
- How large is the eligible population, and what is the baseline order rate and its variance (for sizing)?
### What a Strong Answer Covers
- **Causal framing:** does the candidate separate the causal estimand from raw usage, and recognize the selection problem in conditioning on a post-treatment behavior?
- **Choice of primary metric:** is the metric tied to business value (cost-aware), with a coherent set of secondary and guardrail metrics rather than a single vanity number?
- **Randomization unit:** is the unit justified, and does the candidate reason about when that choice could fail rather than asserting it?
- **Design rigor:** is there a clean assignment-based analysis with trustworthiness checks (balance, sample-ratio-mismatch, contamination, logging)?
- **Confounding:** can the candidate enumerate plausible confounders and state how each defense (experimental and quasi-experimental) addresses them and its limits?
- **Cannibalization & time horizon:** does the candidate probe whether value is created vs. merely shifted, and pick a measurement window that would expose that?
- **Heterogeneity:** is segment analysis handled with discipline (pre-registration, multiple-comparison control, validation) rather than post-hoc slicing?
- **Decision rule:** does the launch criterion integrate significance, economics, guardrails, durability, and segment robustness into one defensible rule?
### Follow-up Questions
- Suppose user-level randomization shows a positive effect, but you suspect marketplace interference inflated it. How would you redesign to get an unbiased estimate, and how would you reconcile the two results?
- The average treatment effect is slightly negative, but dormant users show a large positive effect. What policy do you propose, and how do you validate it without overfitting to the segment you discovered?
- How would you size the experiment and choose its duration for a profit-based metric with high variance and delayed (retention) outcomes?
- How would you detect and quantify "training users to wait for discounts" — i.e., a negative long-run effect that doesn't appear in the first order cycle?
Quick Answer: This question evaluates a data scientist's causal inference and experimentation competencies—including randomized trial design, treatment-effect estimation, metrics specification, and economic impact analysis for promotions in a multi-sided food-delivery marketplace—and is classified under the Machine Learning domain with emphasis on practical application supported by conceptual understanding. It is commonly asked to assess how a candidate handles confounding and interference, selects primary and guardrail metrics, and translates incremental impact estimates into profit-focused launch decisions in real-world marketplace settings.