Build ETA prediction and simulate impact
Company: DoorDash
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Onsite
You received a take-home dataset with order-, store-, and dasher-level features to predict delivery ETA (minutes from created_at to delivered_at). Deliverables:
A) Problem framing: Define the target precisely and list at least 10 features across demand, supply, and network (e.g., historical prep time by merchant-hour, driver density within 3 km in last 10 minutes, rain indicator, queue depth at store, distance via road graph, time-of-day, promo active, cuisine, orders-in-batch).
B) Leakage and splitting: Identify all likely leakage sources (e.g., features derived after pickup) and propose a time-based CV (e.g., rolling-origin) with train=[Aug 1–24, 2025], valid=[Aug 25–31], test=[Sep 1–7]. Justify any domain adaptation if training on other cities.
C) Modeling: Compare GBM (e.g., XGBoost/LightGBM) vs gradient-boosted quantile model for P50/P90; justify loss choices (MAE, Huber, pinball). Include feature interactions you would engineer and regularization you’d tune.
D) Evaluation: Report MAE, median AE, P90 AE, coverage of 80% prediction intervals, and calibration plots. Describe how you’d compute calibration error and reliability curves.
E) Decisioning: Show how ETA error impacts dispatch decisions (late-delivery penalties vs courier idle cost). Propose a cost-sensitive objective or post-hoc thresholding that minimizes expected cost under asymmetric penalties.
F) Explainability and fairness: Use SHAP or permutation importance to audit features; outline checks for bias across neighborhoods or vehicle types and how you’d mitigate (e.g., monotonic constraints, group calibration).
G) Production: Outline a feature store, streaming inference latency budget, model retraining cadence, drift detection (PSI/KS on key features), and an online A/B plan to validate offline gains while watching guardrails.
Quick Answer: This question evaluates machine learning competencies for ETA prediction, including feature engineering, leakage detection, time-based validation, point and quantile modeling, evaluation and calibration metrics, cost-sensitive decisioning, explainability and fairness auditing, and production deployment considerations.