System Design: Real‑Time Pickup and Drop‑off ETA Prediction
Context: You’re designing an end‑to‑end system that predicts pickup and drop‑off ETAs at trip request time for a large ride‑hailing marketplace. At request time, the driver is not yet assigned; predictions must serve within tight latency and remain reliable during traffic spikes, incidents, and rare events.
Specify the following:
-
Data Sources and Features
-
Road graph and segment metadata
-
GPS traces and map‑matched trips
-
Historical segment speeds and travel times (time‑of‑day/weekday seasonality)
-
Live traffic incidents and real‑time speeds
-
Weather and major events
-
Supply‑demand density and marketplace state
-
Driver behavioral features (e.g., acceptance, cruising patterns)
-
Device/network latency and dispatch latency
-
Road closures, construction, and routing constraints
-
Modeling Approaches
-
Baseline: map‑matching + dynamic shortest‑path
-
Gradient‑boosted trees
-
Spatiotemporal deep model (sequence/graph)
-
How you represent routes for each model
-
How you capture heteroskedasticity (variance changing with context)
-
Uncertainty and Calibration
-
Produce 50% and 90% prediction intervals
-
Methods: quantile regression, conformal prediction
-
Calibration strategy
-
Training and Serving Architecture
-
Feature store (offline/online parity)
-
Streaming updates for traffic and incidents
-
Drift detection and monitoring
-
Online refresh cadence and deployment (canary, blue/green)
-
Latency and availability SLAs
-
Leakage and Censoring Controls
-
Exclude post‑dispatch signals
-
Handling reroutes and cancellations
-
Cold‑start regions and rare events strategies
-
Offline Evaluation
-
Metrics: MAE/MAPE, P50/P90 error, coverage of 90% intervals, CRPS
-
Stratified slices by city, time‑of‑day, weather, and surge
-
Proper temporal/geographical holdout to avoid leakage
-
Online Evaluation and Rollout
-
A/B test with guardrails: cancellation rate, pickup wait, driver idle time, reliability of intervals
-
Rollout criteria and monitoring plan
Conclude with trade‑offs and select an MVP model for initial deployment.