Design a robust traffic forecasting pipeline

Q: Design a robust traffic forecasting pipeline

This question evaluates a candidate's competency in end-to-end time-series forecasting pipeline design, covering data cleaning and missing-value handling, anomaly detection and intervention strategies, feature engineering, probabilistic modeling with Unobserved Components Models, rolling-origin backtesting, model comparison, and scaling for multiple related series. It is commonly asked to assess practical and conceptual understanding of Machine Learning and time-series forecasting — including model assumptions, uncertainty quantification, evaluation metrics for quantiles, and productionization considerations — and tests both conceptual understanding and practical application within the Machine Learning / Time-Series Forecasting domain.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Loading...

Forecasting Daily Amazon Retail Traffic: End-to-End Design

You are given 5 years of daily Amazon retail site traffic counts. Design an end-to-end forecasting pipeline that produces 1-, 7-, and 28-day-ahead forecasts along with 10th/50th/90th percentile prediction intervals.

Specify and justify the following:

(a) Data cleaning and missing-value strategies

How you would standardize the time index, handle bots/outages/duplicates, apply transformations, and impute or carry missingness into the model.

(b) Anomaly detection and treatment

Methods to detect point outliers and regime shifts, and how you would downweight, cap, or model them (e.g., interventions).

(c) Feature engineering

Calendar and event features (holidays, Prime Day, Black Friday/Cyber Monday), promotions/price indices, day-of-week/weekend effects, moving averages and lags, and any external drivers.

(d) Model choice: Unobserved Components Model (UCM)

Define the UCM structure (trend, seasonality, regressors), describe estimation via Kalman filtering and smoothing, and list key hyperparameters (e.g., state variances, seasonal complexity).

(e) Rolling-origin backtesting

Your scheme to pick training window lengths, forecast horizons, refit frequency, metrics (including for quantiles), and guardrails against leakage.

(f) UCM vs. SARIMA comparison

Contrast assumptions, interpretability, support for exogenous regressors, handling missing data, multi-seasonality, and computational cost.

(g) Scaling to hundreds of related series

How you would productionize and parallelize, share information across series, and when you would switch to alternative models (e.g., global probabilistic or deep-learning approaches).

Design a robust traffic forecasting pipeline

Forecasting Daily Amazon Retail Traffic: End-to-End Design

Solution

Comments (0)

Design a robust traffic forecasting pipeline

Overview

Forecasting Daily Amazon Retail Traffic: End-to-End Design

Solution

Comments (0)