Design and analyze batching algorithm experiment
Company: DoorDash
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Onsite
DoorDash plans to test a new order-batching/dispatch algorithm in 10 cities during August–September 2025. Spillovers between nearby areas are likely. Design and analyze the experiment:
1) Randomization unit: Propose a geo-cell clustering and stratified randomization plan that limits interference (e.g., 2–5 km hex cells), ensuring balance on baseline order volume, cuisine mix, and dasher supply. How will you detect and mitigate cross-cell spillovers?
2) Primary/secondary metrics: Choose one primary success metric (e.g., orders delivered per dasher hour, or 90th percentile delivery time) and at least three guardrails (e.g., customer cancellations, courier wait-at-store, restaurant prep SLA breaches, fairness across neighborhoods). Define each precisely, including inclusion/exclusion rules and winsorization.
3) Power: Baseline P90 delivery time is 40.0 minutes with SD 7.0; you expect a −1.2 minute improvement. There are 50 geo-cells per arm with average 8,000 orders per cell over the test. Intracluster correlation (ICC) is 0.20. Compute required sample size or achieved power using cluster-robust approximations; state any design effects and assumptions.
4) Analysis: Specify the intention-to-treat model with cluster-robust SEs, include pre-period CUPED covariates, and a plan for SRM checks. Provide the exact regression you would run (formula and covariates), how you will handle right-skew (e.g., log transform or quantile regression), and how you will aggregate cell-level quantiles.
5) Heterogeneity: Pre-register subgroup analyses (e.g., by time-of-day, cuisine, weather severity index). Show how you will control false discovery. What minimum subgroup sample size do you require?
6) Operational rollout: Describe a safe ramp plan and “kill switches.” If a mid-test storm hits 3 cities, explain how you’ll use difference-in-differences or synthetic control on impacted cells without biasing the ITT.
7) Decision rule: Write the exact thresholding rule (effect size, confidence, and guardrail constraints) for shipping, with an example calculation using hypothetical results.
Quick Answer: This question evaluates experiment design and causal inference competencies—covering geo-randomization and spillover control, precise metric specification with guardrails, cluster-based power and sample-size calculations, intention-to-treat analysis with cluster-robust inference, heterogeneity analysis, and operational rollout and decision-rule planning in the Analytics & Experimentation domain. It is commonly asked because interviewers need to assess both conceptual understanding and practical application: designing robust cluster-randomized geo-experiments that limit interference, define and pre-register metrics and analysis, compute cluster-adjusted power, and specify operational safeguards and clear shipping criteria.