PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/DoorDash

Build ETA prediction and simulate impact

Last updated: Mar 29, 2026

Quick Overview

This question evaluates machine learning competencies for ETA prediction, including feature engineering, leakage detection, time-based validation, point and quantile modeling, evaluation and calibration metrics, cost-sensitive decisioning, explainability and fairness auditing, and production deployment considerations.

  • hard
  • DoorDash
  • Machine Learning
  • Data Scientist

Build ETA prediction and simulate impact

Company: DoorDash

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You received a take-home dataset with order-, store-, and dasher-level features to predict delivery ETA (minutes from created_at to delivered_at). Deliverables: A) Problem framing: Define the target precisely and list at least 10 features across demand, supply, and network (e.g., historical prep time by merchant-hour, driver density within 3 km in last 10 minutes, rain indicator, queue depth at store, distance via road graph, time-of-day, promo active, cuisine, orders-in-batch). B) Leakage and splitting: Identify all likely leakage sources (e.g., features derived after pickup) and propose a time-based CV (e.g., rolling-origin) with train=[Aug 1–24, 2025], valid=[Aug 25–31], test=[Sep 1–7]. Justify any domain adaptation if training on other cities. C) Modeling: Compare GBM (e.g., XGBoost/LightGBM) vs gradient-boosted quantile model for P50/P90; justify loss choices (MAE, Huber, pinball). Include feature interactions you would engineer and regularization you’d tune. D) Evaluation: Report MAE, median AE, P90 AE, coverage of 80% prediction intervals, and calibration plots. Describe how you’d compute calibration error and reliability curves. E) Decisioning: Show how ETA error impacts dispatch decisions (late-delivery penalties vs courier idle cost). Propose a cost-sensitive objective or post-hoc thresholding that minimizes expected cost under asymmetric penalties. F) Explainability and fairness: Use SHAP or permutation importance to audit features; outline checks for bias across neighborhoods or vehicle types and how you’d mitigate (e.g., monotonic constraints, group calibration). G) Production: Outline a feature store, streaming inference latency budget, model retraining cadence, drift detection (PSI/KS on key features), and an online A/B plan to validate offline gains while watching guardrails.

Quick Answer: This question evaluates machine learning competencies for ETA prediction, including feature engineering, leakage detection, time-based validation, point and quantile modeling, evaluation and calibration metrics, cost-sensitive decisioning, explainability and fairness auditing, and production deployment considerations.

Related Interview Questions

  • Design a Homepage Store Recommender - DoorDash (hard)
  • Design a Low-Latency Store Recommender - DoorDash (hard)
  • How would you target promotions to grow consumers? - DoorDash (medium)
  • Design and evaluate an uplift model - DoorDash (hard)
  • Build a late-delivery risk model - DoorDash (hard)
DoorDash logo
DoorDash
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
8
0

Predicting Delivery ETA (Minutes)

Context

You are given a take-home dataset with order-, store-, and dasher-level features. The goal is to predict delivery ETA defined as minutes from order created_at to delivered_at. Assume you must generate predictions using only information available at the prediction timestamp t0 (e.g., at order creation or at dispatch assignment).

Deliverables

A) Problem framing

  • Define the target precisely (unit, timestamp of prediction, censoring/exclusions).
  • Propose at least 10 features spanning demand, supply, and network (e.g., historical prep time by merchant-hour, driver density within 3 km in the last 10 minutes, rain indicator, queue depth at store, distance via road graph, time-of-day, promo active, cuisine, orders-in-batch).

B) Leakage and splitting

  • Identify likely leakage sources (e.g., features derived after pickup or after t0) and how to prevent them.
  • Propose a time-based cross-validation scheme (e.g., rolling-origin) with an example split: train=[Aug 1–24, 2025], valid=[Aug 25–31], test=[Sep 1–7].
  • Justify any domain adaptation if training on other cities.

C) Modeling

  • Compare gradient-boosted trees (e.g., XGBoost/LightGBM) for point prediction vs gradient-boosted quantile models for P50/P90.
  • Justify loss choices (MAE, Huber, pinball). List key feature interactions and regularization to tune.

D) Evaluation

  • Report MAE, median absolute error, P90 absolute error, coverage of 80% prediction intervals, and calibration plots.
  • Describe how to compute calibration error and reliability curves.

E) Decisioning

  • Explain how ETA error impacts dispatch decisions (late-delivery penalties vs courier idle cost).
  • Propose a cost-sensitive objective or post-hoc thresholding that minimizes expected cost under asymmetric penalties.

F) Explainability and fairness

  • Use SHAP or permutation importance to audit features.
  • Outline checks for bias across neighborhoods or vehicle types and how to mitigate (e.g., monotonic constraints, group calibration).

G) Production

  • Outline a feature store, streaming inference latency budget, model retraining cadence, drift detection (PSI/KS on key features), and an online A/B plan to validate offline gains while monitoring guardrails.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More DoorDash•More Data Scientist•DoorDash Data Scientist•DoorDash Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.