Predicting Delivery ETA (Minutes)
Context
You are given a take-home dataset with order-, store-, and dasher-level features. The goal is to predict delivery ETA defined as minutes from order created_at to delivered_at. Assume you must generate predictions using only information available at the prediction timestamp t0 (e.g., at order creation or at dispatch assignment).
Deliverables
A) Problem framing
-
Define the target precisely (unit, timestamp of prediction, censoring/exclusions).
-
Propose at least 10 features spanning demand, supply, and network (e.g., historical prep time by merchant-hour, driver density within 3 km in the last 10 minutes, rain indicator, queue depth at store, distance via road graph, time-of-day, promo active, cuisine, orders-in-batch).
B) Leakage and splitting
-
Identify likely leakage sources (e.g., features derived after pickup or after t0) and how to prevent them.
-
Propose a time-based cross-validation scheme (e.g., rolling-origin) with an example split: train=[Aug 1–24, 2025], valid=[Aug 25–31], test=[Sep 1–7].
-
Justify any domain adaptation if training on other cities.
C) Modeling
-
Compare gradient-boosted trees (e.g., XGBoost/LightGBM) for point prediction vs gradient-boosted quantile models for P50/P90.
-
Justify loss choices (MAE, Huber, pinball). List key feature interactions and regularization to tune.
D) Evaluation
-
Report MAE, median absolute error, P90 absolute error, coverage of 80% prediction intervals, and calibration plots.
-
Describe how to compute calibration error and reliability curves.
E) Decisioning
-
Explain how ETA error impacts dispatch decisions (late-delivery penalties vs courier idle cost).
-
Propose a cost-sensitive objective or post-hoc thresholding that minimizes expected cost under asymmetric penalties.
F) Explainability and fairness
-
Use SHAP or permutation importance to audit features.
-
Outline checks for bias across neighborhoods or vehicle types and how to mitigate (e.g., monotonic constraints, group calibration).
G) Production
-
Outline a feature store, streaming inference latency budget, model retraining cadence, drift detection (PSI/KS on key features), and an online A/B plan to validate offline gains while monitoring guardrails.