Design a ride-hailing ETA system
Company: Uber
Role: Data Scientist
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
During a first-round Data Scientist interview, you are asked a product and machine learning case question:
Design an estimated time of arrival (ETA) prediction system for a ride-hailing product.
Your discussion should cover:
- what the product should predict: driver-to-pickup ETA, trip-to-destination ETA, or both
- how to define the prediction target and timestamp conventions
- what data and features you would use, such as route information, live traffic, driver and rider location, time of day, weather, and supply-demand conditions
- how to construct training labels, including how to handle cancellations, missing labels, and delayed ground truth
- model choices and serving constraints, especially low-latency online prediction
- offline evaluation metrics and their tradeoffs, such as MAE, RMSE, P50 or P90 absolute error, and calibration
- online experimentation and business guardrails
- failure modes such as concept drift, cold start regions, sparse areas, extreme traffic events, and fairness across locations or user segments
State any assumptions clearly and explain what you would prioritize first.
Quick Answer: This question evaluates competency in applied machine learning and data science, including ETA system and product design, feature and label engineering, model selection and evaluation, low-latency serving, and operational challenges such as delayed ground truth, concept drift, cold starts, and fairness.