Design and analyze a switchback experiment
Company: DoorDash
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Technical Screen
You are optimizing a delivery marketplace feature suspected to reduce cold-food incidents for bike couriers in dense zones. Design a 2-week switchback experiment at the city level that toggles the feature ON/OFF by equal-length time slots within each city. Be precise and address: (A) Randomization: Choose a slot length L given an average order lifecycle of 45 minutes and a driver relocation/carryover horizon of 30 minutes. Justify L to minimize contamination and describe a block-randomization scheme that balances day-of-week and peak hours while preventing predictability. (B) Assignment vs exposure: Define the difference between slot-level assignment (Intention-to-Treat) and realized exposure when some units operate in OFF slots but pick up spillover demand from neighboring ON slots. Specify what goes in the numerator/denominator for the primary metric (cold-food rate among biker deliveries), and show two denominator variants: include all deliveries (condition_label=0 and 1) vs include only deliveries with condition_label=1. (C) Analysis model: Write the exact regression you would run (formula notation is fine) with city fixed effects and slot-of-week fixed effects, and cluster-robust SEs at the city×slot level. Explain how you would incorporate pre-period baselines or covariates (e.g., weather, surge, courier mix) for precision. (D) Power: With baseline cold-food rate = 6%, target relative reduction = 10% (MDE = 0.6pp), average 120 eligible orders per slot, intracluster correlation (ICC) at the slot level = 0.02, and 14 days, estimate the number of switchbacks (ON↔OFF transitions per city) needed for 80% power at α=0.05. State assumptions and show the core calculation or code you would use. (E) Diagnostics: List concrete randomization checks and balance tests you will run, and how you would test for carryover (e.g., leading indicators, excluding boundary intervals). (F) Robustness: How would you handle partial compliance, missing telemetry, or shocks (major events) mid-test? Describe a principled decision rule to stop, extend, or rerun the test.
Quick Answer: This question evaluates proficiency in experimental design, causal inference, randomization and contamination control, regression modeling with fixed effects and clustered standard errors, power calculation, and handling of compliance and robustness issues.