How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

What difficulty level is this interview question?

This is a hard difficulty Analytics & Experimentation question, commonly asked during Onsite rounds at DoorDash.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at DoorDash during technical interviews.

Evaluate Biker Feature Success | DoorDash Interview Question

Q: Evaluate Biker Feature Success

This question evaluates experimentation and product analytics skills—metric design, causal inference, and marketplace impact assessment—in the context of a two-sided delivery marketplace, with attention to eligibility, routing, and matching effects.

DoorDash is considering launching Biker Mode, a feature for Dashers who deliver by bicycle. Biker Mode may help bicycle Dashers identify suitable short-distance orders, receive bike-friendly routing, and get matched to deliveries where bicycles are operationally efficient.

You are the data scientist owning the launch. Product, operations, and the dispatch team want a single recommendation — launch, iterate, or roll back — backed by a measurement plan they can trust. Work through the five questions in the Parts below.

Constraints & Assumptions

DoorDash is a two-sided marketplace : a finite pool of Dashers (bike and car) is matched to incoming orders by a dispatch/assignment system. Biker Mode plausibly touches eligibility, routing, and matching (which Dasher gets which order), not just a Dasher-facing UI. State your assumption explicitly if you proceed differently.
The feature is meaningful only for short-distance orders in dense zones with available bike supply (urban cores). Most DoorDash orders are not bike-eligible.
Scale. DoorDash operates across thousands of zones; a given dense zone sees on the order of hundreds to low-thousands of orders per day. You may pick reasonable numbers and label them as assumptions.
You have standard marketplace telemetry: order-level delivery times, offer/assignment logs, Dasher online/active hours, earnings, cancellations, ratings, and per-zone/per-hour supply-demand data.
You do not need a final powered sample size, but you should be able to sketch how you would size the test and what inflates that number.

Clarifying Questions to Ask

Does Biker Mode change dispatch/assignment (who gets which order), or is it purely a UI + routing layer the individual Dasher sees? This determines whether interference is a first-order concern.
Is the business goal incremental throughput (more deliveries completed), or substitution toward a cheaper/greener mode at equal throughput? This changes the primary metric.
What is the intended eligible population — which zones, distance thresholds, and dayparts count as "bike-suitable"?
Is there meaningful cross-zone delivery, or are zones effectively self-contained for matching purposes?
Are Dasher safety and earnings already instrumented well enough to serve as guardrails?
What is the rollout horizon and appetite for risk — a quick read or a durable, multi-week effect?

Part 1 — Define success

How would you define success for the Biker Mode feature? Give a single, defensible success statement and say explicitly what it is deliberately not counting as success.

What a Strong Answer Covers

Frames success as net marketplace value per unit of supply (e.g. completed deliveries or contribution profit per active supply-hour), not bike volume.
Explicitly distinguishes adoption from success and names what is not counted (opt-in rate, raw bike order count, a lift that is really a zero-sum transfer).
Bakes the anti-cannibalization condition into the statement — it should fail a result that merely moves orders from car to bike Dashers at flat total throughput and flat customer reliability.

Part 2 — Metrics

What primary, secondary, and guardrail metrics would you track? Be explicit about why each metric sits in its tier.

What a Strong Answer Covers

One singular primary that is a gaming-resistant ratio (productive use of supply), with the throughput-vs-substitution framing reflected in the choice.
A coherent secondary/diagnostic set that would explain why the primary moved (funnel, utilization, match quality), not just more outcome metrics.
Multi-stakeholder guardrails (customer, bike Dasher, car Dasher, merchant, marketplace), each able to veto a launch, including an explicit total-throughput / car-Dasher cannibalization guardrail.

Part 3 — Experiment design

How would you design an experiment to estimate the causal impact of Biker Mode? Cover the eligible population, the unit of randomization, your hypotheses, the estimator, and how you would size and time the test.

What a Strong Answer Covers

Defines and analyzes the eligible population (dense zones, short distances, bike-supply dayparts) rather than testing across all orders.
Correctly identifies the interference / SUTVA problem and justifies a randomization unit that contains the interference (switchback or zone cluster), instead of defaulting to a Dasher/order-level A/B.
States clean hypotheses and an estimand , defaults to ITT with TOT/CACE as a secondary read, and uses CUPED or pre-period covariates for variance reduction.
Computes standard errors at the level randomization occurred (cluster-robust) and sizes the test with an MDE , inflating the naive count by the design effect from clustering.

Part 4 — Biases, confounders, interference, edge cases

What biases, confounders, marketplace-interference issues, or edge cases would you watch for, and how would you mitigate each?

What a Strong Answer Covers

A threats inventory that spans the pipeline : selection/opt-in bias, design-induced threats (interference, switchback carryover, weather/seasonality, mix-shift/Simpson's paradox), and analysis-time threats (peeking, multiplicity, noncompliance).
A concrete mitigation paired with each threat , located in the design or the analysis — not a bare list of names.
Explicit treatment of cannibalization / spillover as a first-order marketplace risk, not just a generic confounder.

Part 5 — Launch / iterate / roll-back decision

How would you decide whether to launch, iterate, or roll back the feature based on the results?

What a Strong Answer Covers

A decision rule combining statistical significance + practical significance (pre-registered MDE) + guardrail status + operational feasibility , not p-values alone.
A mapping of concrete result patterns to actions (clean win → launch; segment- or weather-specific win or a single fixable guardrail → iterate / gated ship; cannibalization or reliability degradation → roll back).
A staged post-launch plan (ramp with a holdout, continued guardrail monitoring) acknowledging that some effects only surface over weeks.

What a Strong Answer Covers

These dimensions span all five Parts:

Treats the problem as a two-sided marketplace question end to end — every Part reflects that adding or steering bike supply must improve the whole local marketplace, not just bike activity.
Maintains internal consistency: the success statement (Part 1), the primary metric and cannibalization guardrail (Part 2), the interference-aware design (Part 3), and the decision rule (Part 5) all point at the same notion of net value per unit of supply.
Communicates assumptions and trade-offs explicitly, stating where a clarification (UI-only vs. dispatch-touching; throughput vs. substitution) would change the design.

Follow-up Questions

Your switchback shows a positive primary effect, but car-Dasher earnings drop in the same zones while total throughput is flat. What do you conclude and recommend?
The aggregate primary metric improves, yet within every individual zone-hour cell it is flat. What is happening, and which number do you trust?
How would you detect and quantify spillover onto neighboring zones if cross-zone deliveries exist?
The effect is positive only in fair weather and only downtown. Is that a launch, an iterate, or a roll-back, and how would you ship it?
After launch, which of your metrics would you expect to take weeks (not days) to stabilize, and how would you keep monitoring them?

Constraints & Assumptions

DoorDash is a two-sided marketplace : a finite pool of Dashers (bike and car) is matched to incoming orders by a dispatch/assignment system. Biker Mode plausibly touches eligibility, routing, and matching (which Dasher gets which order), not just a Dasher-facing UI. State your assumption explicitly if you proceed differently.
The feature is meaningful only for short-distance orders in dense zones with available bike supply (urban cores). Most DoorDash orders are not bike-eligible.
Scale. DoorDash operates across thousands of zones; a given dense zone sees on the order of hundreds to low-thousands of orders per day. You may pick reasonable numbers and label them as assumptions.
You have standard marketplace telemetry: order-level delivery times, offer/assignment logs, Dasher online/active hours, earnings, cancellations, ratings, and per-zone/per-hour supply-demand data.
You do not need a final powered sample size, but you should be able to sketch how you would size the test and what inflates that number.

Clarifying Questions to Ask

Does Biker Mode change dispatch/assignment (who gets which order), or is it purely a UI + routing layer the individual Dasher sees? This determines whether interference is a first-order concern.
Is the business goal incremental throughput (more deliveries completed), or substitution toward a cheaper/greener mode at equal throughput? This changes the primary metric.
What is the intended eligible population — which zones, distance thresholds, and dayparts count as "bike-suitable"?
Is there meaningful cross-zone delivery, or are zones effectively self-contained for matching purposes?
Are Dasher safety and earnings already instrumented well enough to serve as guardrails?
What is the rollout horizon and appetite for risk — a quick read or a durable, multi-week effect?

Part 1 — Define success

How would you define success for the Biker Mode feature? Give a single, defensible success statement and say explicitly what it is deliberately not counting as success.

What a Strong Answer Covers

Frames success as net marketplace value per unit of supply (e.g. completed deliveries or contribution profit per active supply-hour), not bike volume.
Explicitly distinguishes adoption from success and names what is not counted (opt-in rate, raw bike order count, a lift that is really a zero-sum transfer).
Bakes the anti-cannibalization condition into the statement — it should fail a result that merely moves orders from car to bike Dashers at flat total throughput and flat customer reliability.

Part 2 — Metrics

What primary, secondary, and guardrail metrics would you track? Be explicit about why each metric sits in its tier.

What a Strong Answer Covers

One singular primary that is a gaming-resistant ratio (productive use of supply), with the throughput-vs-substitution framing reflected in the choice.
A coherent secondary/diagnostic set that would explain why the primary moved (funnel, utilization, match quality), not just more outcome metrics.
Multi-stakeholder guardrails (customer, bike Dasher, car Dasher, merchant, marketplace), each able to veto a launch, including an explicit total-throughput / car-Dasher cannibalization guardrail.

Part 3 — Experiment design

What a Strong Answer Covers

Defines and analyzes the eligible population (dense zones, short distances, bike-supply dayparts) rather than testing across all orders.
Correctly identifies the interference / SUTVA problem and justifies a randomization unit that contains the interference (switchback or zone cluster), instead of defaulting to a Dasher/order-level A/B.
States clean hypotheses and an estimand , defaults to ITT with TOT/CACE as a secondary read, and uses CUPED or pre-period covariates for variance reduction.
Computes standard errors at the level randomization occurred (cluster-robust) and sizes the test with an MDE , inflating the naive count by the design effect from clustering.

Part 4 — Biases, confounders, interference, edge cases

What biases, confounders, marketplace-interference issues, or edge cases would you watch for, and how would you mitigate each?

What a Strong Answer Covers

A threats inventory that spans the pipeline : selection/opt-in bias, design-induced threats (interference, switchback carryover, weather/seasonality, mix-shift/Simpson's paradox), and analysis-time threats (peeking, multiplicity, noncompliance).
A concrete mitigation paired with each threat , located in the design or the analysis — not a bare list of names.
Explicit treatment of cannibalization / spillover as a first-order marketplace risk, not just a generic confounder.

Part 5 — Launch / iterate / roll-back decision

How would you decide whether to launch, iterate, or roll back the feature based on the results?

What a Strong Answer Covers

A decision rule combining statistical significance + practical significance (pre-registered MDE) + guardrail status + operational feasibility , not p-values alone.
A mapping of concrete result patterns to actions (clean win → launch; segment- or weather-specific win or a single fixable guardrail → iterate / gated ship; cannibalization or reliability degradation → roll back).
A staged post-launch plan (ramp with a holdout, continued guardrail monitoring) acknowledging that some effects only surface over weeks.

What a Strong Answer Covers

These dimensions span all five Parts:

Treats the problem as a two-sided marketplace question end to end — every Part reflects that adding or steering bike supply must improve the whole local marketplace, not just bike activity.
Maintains internal consistency: the success statement (Part 1), the primary metric and cannibalization guardrail (Part 2), the interference-aware design (Part 3), and the decision rule (Part 5) all point at the same notion of net value per unit of supply.
Communicates assumptions and trade-offs explicitly, stating where a clarification (UI-only vs. dispatch-touching; throughput vs. substitution) would change the design.

Follow-up Questions

Your switchback shows a positive primary effect, but car-Dasher earnings drop in the same zones while total throughput is flat. What do you conclude and recommend?
The aggregate primary metric improves, yet within every individual zone-hour cell it is flat. What is happening, and which number do you trust?
How would you detect and quantify spillover onto neighboring zones if cross-zone deliveries exist?
The effect is positive only in fair weather and only downtown. Is that a launch, an iterate, or a roll-back, and how would you ship it?
After launch, which of your metrics would you expect to take weeks (not days) to stabilize, and how would you keep monitoring them?

Evaluate Biker Feature Success

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Define success

What a Strong Answer Covers

Part 2 — Metrics

What a Strong Answer Covers

Part 3 — Experiment design

What a Strong Answer Covers

Part 4 — Biases, confounders, interference, edge cases

What a Strong Answer Covers

Part 5 — Launch / iterate / roll-back decision

What a Strong Answer Covers

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Evaluate Biker Feature Success

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Define success

What a Strong Answer Covers

Part 2 — Metrics

What a Strong Answer Covers

Part 3 — Experiment design

What a Strong Answer Covers

Part 4 — Biases, confounders, interference, edge cases

What a Strong Answer Covers

Part 5 — Launch / iterate / roll-back decision

What a Strong Answer Covers

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP