Design ETA prediction for Uber rides
Company: Uber
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
Design an end‑to‑end system to predict pickup and drop‑off ETAs at request time. Specify: (1) data sources/features (road graph and segment metadata, GPS traces, historical segment speeds, live traffic incidents/speeds, weather, events, supply‑demand density, driver behavioral features, device/network latency, road closures); (2) modeling approaches comparing a map‑matching + dynamic shortest‑path baseline to gradient‑boosted trees and a spatiotemporal deep model (sequence/graph), and how you’ll represent routes and capture heteroskedasticity; (3) uncertainty outputs and calibration (e.g., quantile regression, conformal prediction) with 50/90% intervals; (4) training/serving architecture (feature store, streaming updates, drift detection, online refresh cadence, canary models, latency/availability SLAs); (5) leakage/censoring controls (e.g., excluding post‑dispatch signals, handling reroutes, cancellations) and strategies for cold‑start regions and rare events; (6) offline evaluation metrics (MAE/MAPE, P50/P90 error, coverage of 90% intervals, CRPS) with stratified slices by city, time‑of‑day, weather, and surge; and (7) online evaluation via A/B test with guardrails (cancellation rate, pickup wait, driver idle time, reliability of intervals) and rollout criteria. Conclude with trade‑offs and select an MVP model for initial deployment.
Quick Answer: This question evaluates competency in end-to-end machine learning system design for real-time spatiotemporal ETA prediction, encompassing feature engineering from heterogeneous data sources, uncertainty quantification, model selection, and production serving and monitoring.