Design a robust traffic forecasting pipeline
Company: Amazon
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
You have 5 years of daily Amazon retail site traffic counts. Design an end-to-end forecasting pipeline to produce 1-, 7-, and 28-day-ahead forecasts and 10th/50th/90th percentile prediction intervals. Specify: (a) data cleaning and missing-value strategies; (b) anomaly detection and treatment; (c) feature engineering (holidays, promotions, price indices, day-of-week, moving averages); (d) model choice focusing on an Unobserved Components Model (state-space with trend/seasonality/regressors), how you would estimate it via Kalman filtering and smoothing, and key hyperparameters; (e) a rolling-origin backtesting scheme and how you would pick window lengths and forecast horizons; (f) how you would compare UCM to SARIMA in assumptions, interpretability, exogenous regressors, handling missing data, multi-seasonality, and computational cost; (g) how you would scale to hundreds of related series and when you would switch models.
Quick Answer: This question evaluates a candidate's competency in end-to-end time-series forecasting pipeline design, covering data cleaning and missing-value handling, anomaly detection and intervention strategies, feature engineering, probabilistic modeling with Unobserved Components Models, rolling-origin backtesting, model comparison, and scaling for multiple related series. It is commonly asked to assess practical and conceptual understanding of Machine Learning and time-series forecasting — including model assumptions, uncertainty quantification, evaluation metrics for quantiles, and productionization considerations — and tests both conceptual understanding and practical application within the Machine Learning / Time-Series Forecasting domain.