PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Analytics & Experimentation/DoorDash

Design and analyze a switchback experiment

Last updated: Jun 24, 2026

Quick Overview

This question evaluates proficiency in experimental design, causal inference, randomization and contamination control, regression modeling with fixed effects and clustered standard errors, power calculation, and handling of compliance and robustness issues.

  • hard
  • DoorDash
  • Analytics & Experimentation
  • Data Scientist

Design and analyze a switchback experiment

Company: DoorDash

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: hard

Interview Round: Technical Screen

You are optimizing a delivery marketplace feature suspected to reduce cold-food incidents for bike couriers in dense zones. Design a 2-week switchback experiment at the city level that toggles the feature ON/OFF by equal-length time slots within each city. Be precise and address: (A) Randomization: Choose a slot length L given an average order lifecycle of 45 minutes and a driver relocation/carryover horizon of 30 minutes. Justify L to minimize contamination and describe a block-randomization scheme that balances day-of-week and peak hours while preventing predictability. (B) Assignment vs exposure: Define the difference between slot-level assignment (Intention-to-Treat) and realized exposure when some units operate in OFF slots but pick up spillover demand from neighboring ON slots. Specify what goes in the numerator/denominator for the primary metric (cold-food rate among biker deliveries), and show two denominator variants: include all deliveries (condition_label=0 and 1) vs include only deliveries with condition_label=1. (C) Analysis model: Write the exact regression you would run (formula notation is fine) with city fixed effects and slot-of-week fixed effects, and cluster-robust SEs at the city×slot level. Explain how you would incorporate pre-period baselines or covariates (e.g., weather, surge, courier mix) for precision. (D) Power: With baseline cold-food rate = 6%, target relative reduction = 10% (MDE = 0.6pp), average 120 eligible orders per slot, intracluster correlation (ICC) at the slot level = 0.02, and 14 days, estimate the number of switchbacks (ON↔OFF transitions per city) needed for 80% power at α=0.05. State assumptions and show the core calculation or code you would use. (E) Diagnostics: List concrete randomization checks and balance tests you will run, and how you would test for carryover (e.g., leading indicators, excluding boundary intervals). (F) Robustness: How would you handle partial compliance, missing telemetry, or shocks (major events) mid-test? Describe a principled decision rule to stop, extend, or rerun the test.

Quick Answer: This question evaluates proficiency in experimental design, causal inference, randomization and contamination control, regression modeling with fixed effects and clustered standard errors, power calculation, and handling of compliance and robustness issues.

Related Interview Questions

  • Evaluate Biker Feature Success - DoorDash (hard)
  • How would you test product changes? - DoorDash (hard)
  • How to test bike delivery? - DoorDash (medium)
  • Investigate LA successful orders drop - DoorDash (easy)
  • How would you diagnose a completed orders drop? - DoorDash (easy)
|Home/Analytics & Experimentation/DoorDash

Design and analyze a switchback experiment

DoorDash logo
DoorDash
Oct 13, 2025, 9:49 PM
hardData ScientistTechnical ScreenAnalytics & Experimentation
50
0

Design and Analyze a Switchback Experiment: Reducing Cold-Food Incidents for Bike Couriers

You are a data scientist on a delivery-marketplace team. A new feature is suspected to reduce cold-food incidents for bike couriers operating in dense urban zones. Because the feature acts on the marketplace itself (dispatch, batching, courier routing), a naive order-level A/B test would suffer interference: treating one order changes which courier is free for a neighboring control order.

Your task is to design a 2-week switchback experiment at the city level that toggles the feature ON/OFF in equal-length time slots within each city, then specify exactly how you would analyze it and decide whether to ship. Work through randomization, the assignment-vs-exposure distinction, the analysis model, a power calculation, diagnostics, and robustness.

Constraints & Assumptions

  • Experiment window: 14 days (2 weeks) per city.
  • Unit of toggling: the entire city for the duration of a time slot.
  • Order dynamics: average order lifecycle (accept → deliver) ≈ 45 minutes ; courier relocation / demand carryover horizon ≈ 30 minutes .
  • Outcome: a binary cold_food_incident flag, restricted to mode = bike deliveries.
  • Exposure flag: each delivery carries a condition_label ∈ {0, 1} indicating whether it was actually served under the feature.
  • Power inputs (Part D): baseline cold-food rate p0=6%p_0 = 6\%p0​=6% ; target relative reduction 10%10\%10% (MDE =0.6= 0.6=0.6 pp); average 120 eligible bike orders per slot ; slot-level intracluster correlation ρ=0.02\rho = 0.02ρ=0.02 ; two-sided α=0.05\alpha = 0.05α=0.05 ; power =0.80= 0.80=0.80 .

Clarifying Questions to Ask

Before designing, a candidate should scope the problem by asking:

  • What is the decision the experiment informs — a global ship/no-ship, or a per-city rollout? This sets the estimand.
  • Can we run multiple cities in parallel , or is this a single-city test? (Critical for power and for identifying time fixed effects.)
  • What historical (pre-period) data is available per city and per time-of-week, and how clean is it?
  • How is condition_label set operationally — is non-compliance (ON slots where the feature silently fails, or OFF slots that pick up spillover) common?
  • Are there known major events (storms, sports, holidays) in the 2-week window that we should anticipate?
  • Is there a secondary/guardrail set of metrics (delivery time, courier earnings, cancellations) that must not regress?

Part A — Randomization

Choose a slot length LLL given the 45-minute order lifecycle and 30-minute relocation/carryover horizon, and justify it so as to minimize contamination across slot boundaries. Then describe a block-randomization scheme that balances day-of-week and peak hours while preventing the schedule from being predictable to operators.

What This Part Should Cover

  • A defended numeric choice of LLL tied to the 75-min contamination horizon , with the trade-off (more switchbacks vs. cleaner cores) made explicit.
  • A concrete treatment of guard bands / wash-in / wash-out and which timestamp attributes a delivery to a slot.
  • A stratified block scheme that balances DOW × time-of-day and a separate mechanism (run-length caps, de-synchronization, blinding) that defeats predictability.

Part B — Assignment vs. Exposure

Define the difference between slot-level assignment (the Intention-to-Treat lever) and realized exposure when some deliveries in OFF slots pick up spillover demand from neighboring ON slots (or ON-slot deliveries silently fail to trigger the feature). Then specify the numerator and denominator for the primary metric (cold-food rate among biker deliveries), and give two denominator variants: (1) include all deliveries (condition_label = 0 and 1); (2) include only deliveries with condition_label = 1.

What This Part Should Cover

  • A clear, level-correct contrast: assignment is slot-level and pre-determined ; exposure is delivery-level and realized .
  • Why ITT (variant 1) is the unbiased, shippable policy effect and why variant 2 reintroduces selection bias by conditioning on a post-treatment variable.
  • Precise numerator/denominator definitions, including the bike-only and slot-core restrictions.

Part C — Analysis Model

Write the exact regression you would run (formula notation is fine) with city fixed effects and slot-of-week fixed effects, using cluster-robust standard errors at the city × slot level. Then explain how you would incorporate pre-period baselines or covariates (weather, surge, courier mix) to improve precision.

What This Part Should Cover

  • A correctly specified FE model (city FE, slot-of-week FE) with the treatment coefficient interpreted as the ITT effect in pp .
  • Clustering at the city × slot level justified by the assignment level / ICC.
  • A principled covariate / CUPED baseline strategy that reduces variance without biasing the effect, plus a note on pre-registration.

Part D — Power

Using the inputs above (baseline 6%6\%6%, MDE 0.60.60.6 pp, 120 orders/slot, ICC =0.02= 0.02=0.02, 14 days, α=0.05\alpha = 0.05α=0.05, power 0.800.800.80), estimate the number of switchbacks (ON↔OFF transitions per city) needed for 80% power. State your assumptions and show the core calculation or code.

What This Part Should Cover

  • Correct use of the design effect to deflate the per-slot sample to effective observations.
  • A two-proportion power calculation yielding slots per arm and a total slot budget, with stated assumptions (and any deliberate conservatism flagged).
  • The key realization that one city in 2 weeks is underpowered , leading to a parallel-cities (or extended-duration) design — and a correctly-scoped per-city transition count (bounded by that city's own slot count, not the experiment-wide total).

Part E — Diagnostics

List the concrete randomization / balance checks you will run, and describe how you would test for carryover (e.g., leading/lagging indicators, excluding boundary intervals).

What This Part Should Cover

  • Pre-registered balance tests : pre-period outcome balance, in-test covariate SMDs, stratum-balance verification.
  • A lead/lag (event-study) test for carryover, plus drop-the-boundary sensitivity and spillover mapping near city borders.
  • An explicit threshold or expectation for what "passes" (e.g., lag coefficient ≈ 0, ≥ ~90% of mass in slot cores).

Part F — Robustness

Describe how you would handle partial compliance, missing telemetry, and shocks (major events) mid-test, and give a principled decision rule to stop, extend, or rerun the experiment.

What This Part Should Cover

  • ITT-as-primary with IV/2SLS (not a naive conditioned ratio) for the effect on the treated.
  • A pre-registered missing-telemetry policy (MCAR vs. correlated; IPW / imputation; a hard drop threshold) and shock handling via an event calendar + covariates / exclusion.
  • A concrete stop / extend / rerun rule tied to compliance gap, carryover test results, realized power, and valid-inference constraints on peeking.

What a Strong Answer Covers

Across all parts, the interviewer is looking for an experimentalist who treats the switchback as an interference-control device and reasons end-to-end:

  • Estimand discipline: ITT as the primary, shippable effect throughout; exposure-conditioned quantities recovered only via IV, never as naive ratios.
  • Internal consistency: the slot length, guard bands, clustering level, FE structure, and power calculation all reference the same unit of assignment (city × slot) and the same contamination horizon.
  • Quantitative correctness: the design effect, two-proportion math, and the slots → cities → per-city-transitions chain are arithmetically sound and the conservatisms are named.
  • Pre-registration mindset: covariates, missingness thresholds, decision rules, and any interim looks are committed in advance to protect α\alphaα .

Follow-up Questions

  • If the realized first stage is weak (ON and OFF slots show nearly identical exposure), what does that imply for both the ITT and the IV estimates, and how would you respond?
  • Suppose the lead/lag test shows a significant lag coefficient. Walk through exactly what you change in the next iteration of the design.
  • You observe that cold-food rate has strong time-of-day heterogeneity (much worse at dinner peak). How would you estimate and report heterogeneous treatment effects without inflating false positives?
  • Marketing wants to ship after one week because the point estimate "looks good." Explain precisely why that is unsafe under your design and what inference procedure would make an early look legitimate.
Loading comments...

Browse More Questions

More Analytics & Experimentation•More DoorDash•More Data Scientist•DoorDash Data Scientist•DoorDash Analytics & Experimentation•Data Scientist Analytics & Experimentation

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.