This question evaluates a data scientist's competency in end-to-end machine learning system design for spatiotemporal prediction, covering temporal and geospatial feature engineering, model family selection, metric tradeoffs and calibration, selection bias and noise handling, latency and deployment constraints, monitoring, and experiment design.
You are a Data Scientist at a ride-hailing company. Design an ETA system used in the rider and driver apps to estimate both pickup ETA and trip ETA. Describe an end-to-end approach: define the prediction targets and labels, choose features, select model families, and explain how you would evaluate the system offline and online. Your answer should address metric tradeoffs such as MAE vs. RMSE vs. MAPE vs. quantile loss, calibration of predicted ETAs, selection bias from canceled trips, traffic shocks, GPS and map-matching noise, geospatial cold start, latency constraints, retraining, monitoring, and experiment design after launch.