Forecast and Analyze DoorDash Menu Price Inflation Gap
Company: DoorDash
Role: Data Scientist
Category: Statistics & Math
Difficulty: medium
Interview Round: Onsite
##### Question
DoorDash wants to understand and forecast the difference between on-platform menu prices and the same items' in-store prices (the "inflation gap"). Design an end-to-end analysis that:
1. **Measures the current gap.** Collect and clean matched item-level prices, then construct a robust index that summarizes how much higher (or lower) platform prices are versus in-store prices, both as a level (markup) and as an inflation differential (the change in the gap over time).
2. **Matches identical items.** Describe how you would link a platform item at a given store and time to the same item in-store (normalization, blocking, similarity scoring, one-to-one resolution, and how you would validate match quality).
3. **Quantifies uncertainty.** Produce confidence intervals for the gap when no A/B test is possible, accounting for items clustering within stores and prices persisting over time.
4. **Forecasts the gap.** Build a time-series model (with exogenous drivers such as CPI, fuel, wages, and policy changes) to forecast the gap, and produce prediction intervals.
5. **Sizes the study / computes MDE.** Without randomization, derive the sample size and minimum detectable effect for an observational design (e.g., pre/post or difference-in-differences), incorporating clustering and autocorrelation design effects.
Cover data sources, matching identical items, the choice of price index, bootstrapped vs. model-based confidence intervals, the forecasting model class, and the power-analysis assumptions.
Quick Answer: This DoorDash data scientist onsite question evaluates statistical analysis, item-level matching pipelines, price-index construction, time-series forecasting, and uncertainty quantification for measuring the platform-versus-in-store price inflation gap. Candidates must build matched-item log price gaps, quantify uncertainty with cluster- and time-aware methods, forecast with SARIMAX/state-space models, and compute confidence intervals or sample-size/MDE requirements for an observational (pre-post or difference-in-differences) design with no A/B test.