PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Analytics & Experimentation/DoorDash

Evaluate Biker Feature Success

Last updated: Jun 21, 2026

Quick Overview

This question evaluates experimentation and product analytics skills—metric design, causal inference, and marketplace impact assessment—in the context of a two-sided delivery marketplace, with attention to eligibility, routing, and matching effects.

  • hard
  • DoorDash
  • Analytics & Experimentation
  • Data Scientist

Evaluate Biker Feature Success

Company: DoorDash

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: hard

Interview Round: Onsite

DoorDash is considering launching **Biker Mode**, a feature for Dashers who deliver by bicycle. Biker Mode may help bicycle Dashers identify suitable short-distance orders, receive bike-friendly routing, and get matched to deliveries where bicycles are operationally efficient. You are the data scientist owning the launch. Product, operations, and the dispatch team want a single recommendation — **launch, iterate, or roll back** — backed by a measurement plan they can trust. Work through the five questions in the Parts below. ### Constraints & Assumptions - DoorDash is a **two-sided marketplace**: a finite pool of Dashers (bike and car) is matched to incoming orders by a dispatch/assignment system. Biker Mode plausibly touches **eligibility, routing, and matching** (which Dasher gets which order), not just a Dasher-facing UI. State your assumption explicitly if you proceed differently. - The feature is meaningful only for **short-distance orders in dense zones** with available bike supply (urban cores). Most DoorDash orders are *not* bike-eligible. - **Scale.** DoorDash operates across thousands of zones; a given dense zone sees on the order of hundreds to low-thousands of orders per day. You may pick reasonable numbers and label them as assumptions. - You have standard marketplace telemetry: order-level delivery times, offer/assignment logs, Dasher online/active hours, earnings, cancellations, ratings, and per-zone/per-hour supply-demand data. - You do **not** need a final powered sample size, but you should be able to sketch how you would size the test and what inflates that number. ### Clarifying Questions to Ask - Does Biker Mode change dispatch/assignment (who gets which order), or is it purely a UI + routing layer the individual Dasher sees? This determines whether interference is a first-order concern. - Is the business goal **incremental throughput** (more deliveries completed), or **substitution** toward a cheaper/greener mode at equal throughput? This changes the primary metric. - What is the intended eligible population — which zones, distance thresholds, and dayparts count as "bike-suitable"? - Is there meaningful cross-zone delivery, or are zones effectively self-contained for matching purposes? - Are Dasher safety and earnings already instrumented well enough to serve as guardrails? - What is the rollout horizon and appetite for risk — a quick read or a durable, multi-week effect? ### Part 1 — Define success How would you define success for the Biker Mode feature? Give a single, defensible success statement and say explicitly what it is deliberately *not* counting as success. ```hint What to be suspicious of Adoption (opt-in rate, orders done by bike) is a *leading indicator*, not success. Ask whether a metric could rise while the overall marketplace is no better off. ``` ```hint Whose value are you counting? Anchor the statement on net value *per unit of supply* across the whole local marketplace. Make sure it would reject a result that merely reshuffles work between Dasher types while total throughput and customer reliability stay flat. ``` #### What a Strong Answer Covers - Frames success as **net marketplace value per unit of supply** (e.g. completed deliveries or contribution profit per active supply-hour), not bike volume. - Explicitly distinguishes **adoption from success** and names what is *not* counted (opt-in rate, raw bike order count, a lift that is really a zero-sum transfer). - Bakes the **anti-cannibalization** condition into the statement — it should fail a result that merely moves orders from car to bike Dashers at flat total throughput and flat customer reliability. ### Part 2 — Metrics What **primary**, **secondary**, and **guardrail** metrics would you track? Be explicit about why each metric sits in its tier. ```hint Keep the primary singular A single primary metric avoids multiple-comparison gaming. Prefer a *ratio* that captures productive use of supply and that cannot be inflated by simply doing more low-value trips. ``` ```hint Guardrails are multi-stakeholder A marketplace feature can win for one side and lose for another. Think about who could be hurt — customers, bike Dashers, **car Dashers**, merchants, the marketplace overall — and give each at least one guardrail that can veto a launch. Which single guardrail catches the zero-sum cannibalization failure? ``` #### What a Strong Answer Covers - One **singular primary** that is a gaming-resistant ratio (productive use of supply), with the throughput-vs-substitution framing reflected in the choice. - A coherent **secondary/diagnostic** set that would *explain why the primary moved* (funnel, utilization, match quality), not just more outcome metrics. - **Multi-stakeholder guardrails** (customer, bike Dasher, car Dasher, merchant, marketplace), each able to veto a launch, including an explicit **total-throughput / car-Dasher cannibalization** guardrail. ### Part 3 — Experiment design How would you design an experiment to estimate the **causal** impact of Biker Mode? Cover the eligible population, the unit of randomization, your hypotheses, the estimator, and how you would size and time the test. ```hint The decision that drives everything The unit of randomization is the crux. Ask: if a treated bike Dasher takes an order, does that *change* what a control Dasher experiences? If yes, a naive Dasher- or order-level A/B violates SUTVA and the estimate is biased. ``` ```hint Make the interference happen inside a unit If a naive split is contaminated, you need a randomization unit big enough that the interference is contained *within* one unit instead of leaking across arms. What is the smallest unit that still satisfies that? What new artifact appears at the boundary between consecutive on/off periods if you toggle the same place over time, and what does a bigger unit cost you in independent samples and power? ``` ```hint Estimate on the right population Testing across *all* orders dilutes a real effect to undetectable — define and analyze the eligible population only. Then ask: should your headline number be computed over everyone *assigned* to treatment, or only over those who actually *used* the feature — and which of those two choices quietly lets the keenest Dashers select themselves in? Finally, once your unit of randomization is a whole place-and-time block rather than one order, are the orders inside a block independent — and what does that do to the sample size a textbook formula hands you? ``` #### What a Strong Answer Covers - Defines and analyzes the **eligible population** (dense zones, short distances, bike-supply dayparts) rather than testing across all orders. - Correctly identifies the **interference / SUTVA** problem and justifies a randomization unit that *contains* the interference (switchback or zone cluster), instead of defaulting to a Dasher/order-level A/B. - States clean **hypotheses and an estimand**, defaults to **ITT** with TOT/CACE as a secondary read, and uses **CUPED** or pre-period covariates for variance reduction. - Computes **standard errors at the level randomization occurred** (cluster-robust) and sizes the test with an **MDE**, inflating the naive count by the **design effect** from clustering. ### Part 4 — Biases, confounders, interference, edge cases What biases, confounders, marketplace-interference issues, or edge cases would you watch for, and how would you mitigate each? ```hint Walk the pipeline, not a checklist Don't just list textbook biases. Trace one order from *who self-selects into the feature*, through *what the design itself introduces*, to *what you do at analysis time with many metrics and repeated looks* — a distinct threat hides at each stage. For every one you name, attach a concrete mitigation in the design or analysis; a threat without a fix earns little credit. ``` #### What a Strong Answer Covers - A **threats inventory that spans the pipeline**: selection/opt-in bias, design-induced threats (interference, switchback carryover, weather/seasonality, mix-shift/Simpson's paradox), and analysis-time threats (peeking, multiplicity, noncompliance). - A **concrete mitigation paired with each threat**, located in the design or the analysis — not a bare list of names. - Explicit treatment of **cannibalization / spillover** as a first-order marketplace risk, not just a generic confounder. ### Part 5 — Launch / iterate / roll-back decision How would you decide whether to **launch**, **iterate**, or **roll back** the feature based on the results? ```hint More than a p-value A defensible rule combines statistical significance, *practical* significance (a pre-registered MDE), guardrail status, and operational feasibility. Map concrete result patterns — clean win, segment-specific or fair-weather-only win, low-adoption-but-good-adopters, cannibalization, reliability degradation — to each of the three actions. ``` #### What a Strong Answer Covers - A decision rule combining **statistical significance + practical significance (pre-registered MDE) + guardrail status + operational feasibility**, not p-values alone. - A **mapping of concrete result patterns to actions** (clean win → launch; segment- or weather-specific win or a single fixable guardrail → iterate / gated ship; cannibalization or reliability degradation → roll back). - A **staged post-launch plan** (ramp with a holdout, continued guardrail monitoring) acknowledging that some effects only surface over weeks. ### What a Strong Answer Covers These dimensions span all five Parts: - Treats the problem as a **two-sided marketplace** question end to end — every Part reflects that adding or steering bike supply must improve the whole local marketplace, not just bike activity. - Maintains internal consistency: the success statement (Part 1), the primary metric and cannibalization guardrail (Part 2), the interference-aware design (Part 3), and the decision rule (Part 5) all point at the same notion of net value per unit of supply. - Communicates assumptions and trade-offs explicitly, stating where a clarification (UI-only vs. dispatch-touching; throughput vs. substitution) would change the design. ### Follow-up Questions - Your switchback shows a positive primary effect, but car-Dasher earnings drop in the same zones while total throughput is flat. What do you conclude and recommend? - The aggregate primary metric improves, yet within every individual zone-hour cell it is flat. What is happening, and which number do you trust? - How would you detect and quantify spillover onto neighboring zones if cross-zone deliveries exist? - The effect is positive only in fair weather and only downtown. Is that a launch, an iterate, or a roll-back, and how would you ship it? - After launch, which of your metrics would you expect to take weeks (not days) to stabilize, and how would you keep monitoring them?

Quick Answer: This question evaluates experimentation and product analytics skills—metric design, causal inference, and marketplace impact assessment—in the context of a two-sided delivery marketplace, with attention to eligibility, routing, and matching effects.

Related Interview Questions

  • How would you test product changes? - DoorDash (hard)
  • How to test bike delivery? - DoorDash (medium)
  • Investigate LA successful orders drop - DoorDash (easy)
  • How would you diagnose a completed orders drop? - DoorDash (easy)
  • How would you test a bike delivery option? - DoorDash (easy)
DoorDash logo
DoorDash
Apr 25, 2026, 12:00 AM
Data Scientist
Onsite
Analytics & Experimentation
41
0

DoorDash is considering launching Biker Mode, a feature for Dashers who deliver by bicycle. Biker Mode may help bicycle Dashers identify suitable short-distance orders, receive bike-friendly routing, and get matched to deliveries where bicycles are operationally efficient.

You are the data scientist owning the launch. Product, operations, and the dispatch team want a single recommendation — launch, iterate, or roll back — backed by a measurement plan they can trust. Work through the five questions in the Parts below.

Constraints & Assumptions

  • DoorDash is a two-sided marketplace : a finite pool of Dashers (bike and car) is matched to incoming orders by a dispatch/assignment system. Biker Mode plausibly touches eligibility, routing, and matching (which Dasher gets which order), not just a Dasher-facing UI. State your assumption explicitly if you proceed differently.
  • The feature is meaningful only for short-distance orders in dense zones with available bike supply (urban cores). Most DoorDash orders are not bike-eligible.
  • Scale. DoorDash operates across thousands of zones; a given dense zone sees on the order of hundreds to low-thousands of orders per day. You may pick reasonable numbers and label them as assumptions.
  • You have standard marketplace telemetry: order-level delivery times, offer/assignment logs, Dasher online/active hours, earnings, cancellations, ratings, and per-zone/per-hour supply-demand data.
  • You do not need a final powered sample size, but you should be able to sketch how you would size the test and what inflates that number.

Clarifying Questions to Ask

  • Does Biker Mode change dispatch/assignment (who gets which order), or is it purely a UI + routing layer the individual Dasher sees? This determines whether interference is a first-order concern.
  • Is the business goal incremental throughput (more deliveries completed), or substitution toward a cheaper/greener mode at equal throughput? This changes the primary metric.
  • What is the intended eligible population — which zones, distance thresholds, and dayparts count as "bike-suitable"?
  • Is there meaningful cross-zone delivery, or are zones effectively self-contained for matching purposes?
  • Are Dasher safety and earnings already instrumented well enough to serve as guardrails?
  • What is the rollout horizon and appetite for risk — a quick read or a durable, multi-week effect?

Part 1 — Define success

How would you define success for the Biker Mode feature? Give a single, defensible success statement and say explicitly what it is deliberately not counting as success.

What a Strong Answer Covers

  • Frames success as net marketplace value per unit of supply (e.g. completed deliveries or contribution profit per active supply-hour), not bike volume.
  • Explicitly distinguishes adoption from success and names what is not counted (opt-in rate, raw bike order count, a lift that is really a zero-sum transfer).
  • Bakes the anti-cannibalization condition into the statement — it should fail a result that merely moves orders from car to bike Dashers at flat total throughput and flat customer reliability.

Part 2 — Metrics

What primary, secondary, and guardrail metrics would you track? Be explicit about why each metric sits in its tier.

What a Strong Answer Covers

  • One singular primary that is a gaming-resistant ratio (productive use of supply), with the throughput-vs-substitution framing reflected in the choice.
  • A coherent secondary/diagnostic set that would explain why the primary moved (funnel, utilization, match quality), not just more outcome metrics.
  • Multi-stakeholder guardrails (customer, bike Dasher, car Dasher, merchant, marketplace), each able to veto a launch, including an explicit total-throughput / car-Dasher cannibalization guardrail.

Part 3 — Experiment design

How would you design an experiment to estimate the causal impact of Biker Mode? Cover the eligible population, the unit of randomization, your hypotheses, the estimator, and how you would size and time the test.

What a Strong Answer Covers

  • Defines and analyzes the eligible population (dense zones, short distances, bike-supply dayparts) rather than testing across all orders.
  • Correctly identifies the interference / SUTVA problem and justifies a randomization unit that contains the interference (switchback or zone cluster), instead of defaulting to a Dasher/order-level A/B.
  • States clean hypotheses and an estimand , defaults to ITT with TOT/CACE as a secondary read, and uses CUPED or pre-period covariates for variance reduction.
  • Computes standard errors at the level randomization occurred (cluster-robust) and sizes the test with an MDE , inflating the naive count by the design effect from clustering.

Part 4 — Biases, confounders, interference, edge cases

What biases, confounders, marketplace-interference issues, or edge cases would you watch for, and how would you mitigate each?

What a Strong Answer Covers

  • A threats inventory that spans the pipeline : selection/opt-in bias, design-induced threats (interference, switchback carryover, weather/seasonality, mix-shift/Simpson's paradox), and analysis-time threats (peeking, multiplicity, noncompliance).
  • A concrete mitigation paired with each threat , located in the design or the analysis — not a bare list of names.
  • Explicit treatment of cannibalization / spillover as a first-order marketplace risk, not just a generic confounder.

Part 5 — Launch / iterate / roll-back decision

How would you decide whether to launch, iterate, or roll back the feature based on the results?

What a Strong Answer Covers

  • A decision rule combining statistical significance + practical significance (pre-registered MDE) + guardrail status + operational feasibility , not p-values alone.
  • A mapping of concrete result patterns to actions (clean win → launch; segment- or weather-specific win or a single fixable guardrail → iterate / gated ship; cannibalization or reliability degradation → roll back).
  • A staged post-launch plan (ramp with a holdout, continued guardrail monitoring) acknowledging that some effects only surface over weeks.

What a Strong Answer Covers

These dimensions span all five Parts:

  • Treats the problem as a two-sided marketplace question end to end — every Part reflects that adding or steering bike supply must improve the whole local marketplace, not just bike activity.
  • Maintains internal consistency: the success statement (Part 1), the primary metric and cannibalization guardrail (Part 2), the interference-aware design (Part 3), and the decision rule (Part 5) all point at the same notion of net value per unit of supply.
  • Communicates assumptions and trade-offs explicitly, stating where a clarification (UI-only vs. dispatch-touching; throughput vs. substitution) would change the design.

Follow-up Questions

  • Your switchback shows a positive primary effect, but car-Dasher earnings drop in the same zones while total throughput is flat. What do you conclude and recommend?
  • The aggregate primary metric improves, yet within every individual zone-hour cell it is flat. What is happening, and which number do you trust?
  • How would you detect and quantify spillover onto neighboring zones if cross-zone deliveries exist?
  • The effect is positive only in fair weather and only downtown. Is that a launch, an iterate, or a roll-back, and how would you ship it?
  • After launch, which of your metrics would you expect to take weeks (not days) to stabilize, and how would you keep monitoring them?

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Analytics & Experimentation•More DoorDash•More Data Scientist•DoorDash Data Scientist•DoorDash Analytics & Experimentation•Data Scientist Analytics & Experimentation
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.