Marketplace Interference And Switchback Experiments
Asked of: Data Scientist
Last updated

What's being tested
DoorDash is probing whether you can design causal experiments in a marketplace where one user’s treatment can affect another user’s outcome. Classic user-level A/B tests often fail because supply, demand, prices, batching, dispatch, and delivery times are coupled within a local market. A strong Data Scientist answer shows that you can choose the right randomization unit, control interference, define marketplace-wide metrics, and analyze results with appropriate fixed effects, clustered uncertainty, and operational guardrails. Interviewers are looking for practical judgment: not “run an A/B test,” but “what experiment is valid for this marketplace mechanism, and what tradeoffs does it create?”
Core knowledge
-
Marketplace interference occurs when treatment changes the shared environment: available Dashers, delivery ETAs, restaurant prep queues, batching opportunities, or consumer conversion. If treated users consume scarce supply, control users may see worse
ETA, causing a biased estimate of treatment impact. -
SUTVA — the Stable Unit Treatment Value Assumption — is often violated in delivery marketplaces. The assumption says each unit’s outcome depends only on its own treatment. For DoorDash, that can fail because orders, Dashers, merchants, and consumers interact through shared local supply-demand balance.
-
Switchback experiments randomize treatment by geography-time cells, such as
market_id × hourorzone × daypart, instead of by user. The treatment switches on and off over time within the same market, reducing contamination when marketplace state is shared locally. -
Randomization unit choice should match the interference radius. For dispatch, batching, fees, or delivery promises, prefer
zone × timeormarket × time. For consumer UI copy with minimal supply effects, user-level randomization may be acceptable. For merchant prep changes, merchant-level or geo-level designs may be cleaner. -
Primary metrics should reflect the full marketplace objective, not one side only. Common DoorDash metrics include
order_volume,conversion_rate,gross_order_value,contribution_profit,delivery_time,ETA_accuracy,Dasher_utilization,Dasher_earnings,cancellation_rate,refund_rate, and merchant acceptance or prep latency. -
Guardrail metrics catch harms hidden by the primary metric. For example, order batching may improve
Dasher_utilizationand profit but worsendelivery_time, cold food complaints,NPS, or reorder rate. A fee increase may improve per-order margin but reduce conversion or long-run retention. -
Difference-in-differences with fixed effects is a common analysis frame for switchbacks:
where is geography, is time, controls persistent market differences, and controls common time shocks. -
Clustered standard errors are usually required because observations within a market-time cell are correlated. If treatment is assigned at
zone × hour, uncertainty should be clustered at the randomization level or higher. Treating millions of orders as independent can massively overstate significance. -
Power calculations should use the effective sample size of randomized cells, not raw orders. A test with 50 markets over 14 days and 4 dayparts has roughly assignment cells before accounting for autocorrelation, imbalance, and clustering.
-
Stratification and blocking improve precision. Randomize within market, day of week, and daypart so treatment and control both cover lunch, dinner, weekdays, weekends, high-demand zones, and low-demand zones. This is especially important when demand is highly seasonal or weather-sensitive.
-
Carryover effects can bias switchbacks if treatment changes future states. For example, a bad delivery experience in a treated period may reduce future orders in a control period, or Dasher positioning from one hour may affect the next hour. Use washout periods or coarser time blocks when carryover is plausible.
-
Noncompliance and partial exposure should be measured explicitly. If a batching algorithm is “on” but only affects 20% of eligible orders, analyze both intent-to-treat and exposure-adjusted effects carefully. ITT preserves randomization; treatment-on-treated estimates need stronger assumptions.
Worked example
For Design and analyze a switchback experiment, a strong candidate would start by clarifying the intervention, the marketplace surface it affects, and the interference radius: “Is this changing dispatch, pricing, batching, or consumer experience, and does it affect shared Dasher supply within a zone?” They would state that a user-level A/B test is likely invalid if the treatment changes supply allocation or delivery timing, because treated and control orders compete for the same Dashers.
The answer should be organized around four pillars: randomization design, metric framework, statistical analysis, and operational risks. For design, propose randomizing zone × time block, such as zone-by-2-hour windows, stratified by market, day of week, and daypart. For metrics, define a primary marketplace metric like contribution_profit_per_order or orders_per_consumer_session, plus guardrails such as delivery_time, cancellation_rate, Dasher_earnings, and merchant_lateness.
For analysis, aggregate outcomes to the assignment cell and estimate a regression with zone fixed effects and time fixed effects, using standard errors clustered by zone or assignment cell. A specific tradeoff to flag is time-block length: shorter blocks create more randomized units and power, but increase carryover risk because Dasher locations and batching queues persist across adjacent periods. Longer blocks reduce carryover but create fewer independent observations and may be more exposed to demand shocks.
A strong close would say: “If I had more time, I’d run pre-experiment simulations using historical order streams to estimate power, check balance, select block length, and stress-test sensitivity to clustering and carryover.”
A second angle
For Evaluate Impact of $1 Fee on Fast-Food Profitability, the same interference logic applies, but the treatment is a pricing change rather than an operational algorithm. A naive consumer-level A/B test may appear reasonable, but if the fee reduces demand in some areas, it can free Dasher supply, improve ETAs, and indirectly affect control consumers nearby. The randomization unit could be consumer-level if the fee is small and supply effects are negligible, but market-time randomization is safer if demand shifts are expected to alter delivery speed or Dasher allocation.
The metric frame also changes: the primary metric should be something like profit_per_visitor or contribution_profit_per_session, not just fee revenue per order. The candidate should explicitly decompose impact into conversion loss, order mix changes, average basket size, delivery cost, refund/cancel rates, and repeat behavior.
Common pitfalls
Pitfall: Saying “randomize users 50/50 and compare average order value” for dispatch, batching, or supply-sensitive changes.
That answer ignores interference. A better response explains why treated and control users share Dashers and restaurants, then proposes geo-time randomization or a switchback design that aligns assignment with the level where marketplace state is shared.
Pitfall: Optimizing one side of the marketplace while ignoring the others.
For example, adding bicycle Dashers might improve supply density and reduce cost in dense zones, but could worsen long-distance delivery times, Dasher earnings mix, or merchant pickup congestion. Strong answers define consumer, Dasher, merchant, and business metrics, then name the intended primary metric and guardrails.
Pitfall: Treating order-level rows as independent observations.
If 200,000 orders come from 500 randomized zone-hour cells, the experiment has closer to hundreds of independent units than hundreds of thousands. The right move is to analyze at the assignment-cell level or use fixed effects with clustered standard errors, then discuss power in terms of clusters and intraclass correlation.
Connections
Interviewers may pivot from this topic into causal inference, especially difference-in-differences, synthetic controls, CUPED, and heterogeneous treatment effects. They may also ask about marketplace metric design, pricing experimentation, ranking and dispatch evaluation, or diagnosing a launch where conversion_rate, ETA, and profit move in opposite directions.
Further reading
-
Kohavi, Tang, and Xu, Trustworthy Online Controlled Experiments — Practical treatment of online experiments, guardrails, variance reduction, and common validity threats.
-
Blake and Coey, “Why Marketplace Experimentation Is Harder Than It Seems” — Seminal discussion of interference and marketplace experiment design, especially for two-sided platforms.
-
Angrist and Pischke, Mostly Harmless Econometrics — Strong foundation for fixed effects, clustered standard errors, difference-in-differences, and causal interpretation.
Featured in interview prep guides
Practice questions
- How to test bike delivery?DoorDash · Data Scientist · Technical Screen · medium
- Design experiments for marketplace product changesDoorDash · Data Scientist · Onsite · hard
- Assess Adding Bicycle DashersDoorDash · Data Scientist · Technical Screen · medium
- Design experiment for bike delivery featureDoorDash · Data Scientist · Technical Screen · medium
- Design and analyze a switchback experimentDoorDash · Data Scientist · Technical Screen · hard
- Design an experiment for thermal bagsDoorDash · Data Scientist · Technical Screen · hard
- Design an experiment for order batchingDoorDash · Data Scientist · Onsite · hard
- Evaluate Impact of $1 Fee on Fast-Food ProfitabilityDoorDash · Data Scientist · Onsite · medium
- Design Experiments to Evaluate Courier Initiatives EffectivelyDoorDash · Data Scientist · Onsite · hard
Related concepts
- Switchback Experiments And Marketplace InterferenceAnalytics & Experimentation
- Network Interference And Cluster RandomizationAnalytics & Experimentation
- Cluster Randomized Experiments And Network InterferenceAnalytics & Experimentation
- Experimentation Under Network Interference
- Difference-In-Differences And Quasi-ExperimentsAnalytics & Experimentation
- Clustered And Networked Experiments