How would you design an ETA prediction system?
Company: Uber
Role: Data Scientist
Category: Machine Learning
Difficulty: easy
Interview Round: Technical Screen
Design an end-to-end **ETA (Estimated Time of Arrival)** system for a maps / ride-hailing / delivery product.
Assume users request an ETA for a trip from an origin to a destination (possibly with waypoints). The system must return an ETA in real time.
Cover the following:
1. **Product definition & requirements**
- Who are the users (rider/driver/courier/customer)?
- Latency/throughput targets and how frequently ETA should update.
- What does “good ETA” mean (accuracy vs stability vs calibration)?
2. **Data and labeling**
- What raw data sources you would use (GPS pings, road graph, traffic, weather, historical trips, incidents, etc.).
- How to define the training label (actual travel time) and handle censoring (canceled trips, detours, pauses).
3. **Modeling approach**
- Baselines and incremental modeling (rules → regression/GBDT → sequence models).
- Feature design (time-of-day, road segments, traffic states, driver behavior, route choice).
- How to represent a route (segment-level vs whole-trip).
4. **Evaluation**
- Offline metrics (e.g., MAE/MAPE, quantiles, calibration, tail errors).
- Online metrics and guardrails (user trust, cancellation rate, conversion).
- Slice analysis (rush hour, city centers, long trips, sparse areas).
5. **Serving & system design**
- Real-time feature computation, caching, and fallbacks.
- Model updates, monitoring, drift detection, and alerting.
6. **Key pitfalls**
- Data leakage, feedback loops (ETA affects route choice), selection bias (only completed trips), and non-stationarity.
Provide a concrete proposal and justify tradeoffs.
Quick Answer: This question evaluates a data scientist's capability to design an end-to-end ETA prediction system, testing competencies in machine learning modeling, feature engineering, data labeling, evaluation metrics, and production concerns like real-time serving and monitoring (Machine Learning domain).