Design an experiment for order batching
Company: DoorDash
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Onsite
DoorDash wants to test a new batching policy that lets a dasher pick up two nearby orders in one trip during peak hours. Design an experiment to estimate causal impact on: (a) order conversion rate, (b) actual vs. quoted ETA accuracy, (c) dasher hourly earnings, and (d) restaurant prep-time congestion. Answer precisely: 1) Unit of randomization and why (e.g., zone-hour, store, or dasher) given interference/spillovers; how you’ll mitigate cross-treatment contamination. 2) Stratification/covariate adjustment plan (e.g., city, cuisine, distance bands, forecasted demand) and how you’ll pre-register guardrail metrics (late deliveries, cancellations, CSAT). 3) Sample size and duration: outline the MDE, baseline rates, variance assumptions, and a power calc at 80% power; justify sequential testing or group sequential design if you choose to stop early. 4) Novelty and learning effects: how you will detect and discount the first-N days and measure longer-run steady state. 5) Heterogeneity: how you’ll estimate city-level and distance-band treatment effects without p-hacking (e.g., hierarchical modeling, shrinkage). 6) Decision rule: provide a precise promotion rule when guardrails worsen but primary metrics improve; include acceptable deltas and confidence thresholds.
Quick Answer: This question evaluates competence in experiment design and causal inference for marketplace interventions, covering choice of randomization unit and mitigation of interference/spillovers, stratification and covariate adjustment, power and sample‑size calculations, detection of novelty/steady‑state effects, heterogeneity estimation, and pre-registered guardrail decision rules. It is commonly asked in the Analytics & Experimentation domain because interviewers need to assess both conceptual understanding of causal inference and the practical application of randomized designs and operational measurement to ensure robust, contamination‑aware evaluation of policy changes.