Build ETA prediction and simulate impact

Q: Build ETA prediction and simulate impact

This is a Machine Learning interview question from DoorDash for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Predicting Delivery ETA (Minutes)

Context

You are given a take-home dataset with order-, store-, and dasher-level features. The goal is to predict delivery ETA defined as minutes from order created_at to delivered_at. Assume you must generate predictions using only information available at the prediction timestamp t0 (e.g., at order creation or at dispatch assignment).

Deliverables

A) Problem framing

Define the target precisely (unit, timestamp of prediction, censoring/exclusions).
Propose at least 10 features spanning demand, supply, and network (e.g., historical prep time by merchant-hour, driver density within 3 km in the last 10 minutes, rain indicator, queue depth at store, distance via road graph, time-of-day, promo active, cuisine, orders-in-batch).

B) Leakage and splitting

Identify likely leakage sources (e.g., features derived after pickup or after t0) and how to prevent them.
Propose a time-based cross-validation scheme (e.g., rolling-origin) with an example split: train=[Aug 1–24, 2025], valid=[Aug 25–31], test=[Sep 1–7].
Justify any domain adaptation if training on other cities.

C) Modeling

Compare gradient-boosted trees (e.g., XGBoost/LightGBM) for point prediction vs gradient-boosted quantile models for P50/P90.
Justify loss choices (MAE, Huber, pinball). List key feature interactions and regularization to tune.

D) Evaluation

Report MAE, median absolute error, P90 absolute error, coverage of 80% prediction intervals, and calibration plots.
Describe how to compute calibration error and reliability curves.

E) Decisioning

Explain how ETA error impacts dispatch decisions (late-delivery penalties vs courier idle cost).
Propose a cost-sensitive objective or post-hoc thresholding that minimizes expected cost under asymmetric penalties.

F) Explainability and fairness

Use SHAP or permutation importance to audit features.
Outline checks for bias across neighborhoods or vehicle types and how to mitigate (e.g., monotonic constraints, group calibration).

G) Production

Outline a feature store, streaming inference latency budget, model retraining cadence, drift detection (PSI/KS on key features), and an online A/B plan to validate offline gains while monitoring guardrails.

Build ETA prediction and simulate impact

Predicting Delivery ETA (Minutes)

Context

Deliverables

Solution (Locked)

Comments (0)