PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Amazon

Design a robust traffic forecasting pipeline

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in end-to-end time-series forecasting pipeline design, covering data cleaning and missing-value handling, anomaly detection and intervention strategies, feature engineering, probabilistic modeling with Unobserved Components Models, rolling-origin backtesting, model comparison, and scaling for multiple related series. It is commonly asked to assess practical and conceptual understanding of Machine Learning and time-series forecasting — including model assumptions, uncertainty quantification, evaluation metrics for quantiles, and productionization considerations — and tests both conceptual understanding and practical application within the Machine Learning / Time-Series Forecasting domain.

  • hard
  • Amazon
  • Machine Learning
  • Data Scientist

Design a robust traffic forecasting pipeline

Company: Amazon

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

You have 5 years of daily Amazon retail site traffic counts. Design an end-to-end forecasting pipeline to produce 1-, 7-, and 28-day-ahead forecasts and 10th/50th/90th percentile prediction intervals. Specify: (a) data cleaning and missing-value strategies; (b) anomaly detection and treatment; (c) feature engineering (holidays, promotions, price indices, day-of-week, moving averages); (d) model choice focusing on an Unobserved Components Model (state-space with trend/seasonality/regressors), how you would estimate it via Kalman filtering and smoothing, and key hyperparameters; (e) a rolling-origin backtesting scheme and how you would pick window lengths and forecast horizons; (f) how you would compare UCM to SARIMA in assumptions, interpretability, exogenous regressors, handling missing data, multi-seasonality, and computational cost; (g) how you would scale to hundreds of related series and when you would switch models.

Quick Answer: This question evaluates a candidate's competency in end-to-end time-series forecasting pipeline design, covering data cleaning and missing-value handling, anomaly detection and intervention strategies, feature engineering, probabilistic modeling with Unobserved Components Models, rolling-origin backtesting, model comparison, and scaling for multiple related series. It is commonly asked to assess practical and conceptual understanding of Machine Learning and time-series forecasting — including model assumptions, uncertainty quantification, evaluation metrics for quantiles, and productionization considerations — and tests both conceptual understanding and practical application within the Machine Learning / Time-Series Forecasting domain.

Related Interview Questions

  • Predicting the Next Elevator Call Location - Amazon (medium)
  • Explain Transformer and MoE Fundamentals - Amazon (medium)
  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
3
0
Loading...

Forecasting Daily Amazon Retail Traffic: End-to-End Design

You are given 5 years of daily Amazon retail site traffic counts. Design an end-to-end forecasting pipeline that produces 1-, 7-, and 28-day-ahead forecasts along with 10th/50th/90th percentile prediction intervals.

Specify and justify the following:

(a) Data cleaning and missing-value strategies

  • How you would standardize the time index, handle bots/outages/duplicates, apply transformations, and impute or carry missingness into the model.

(b) Anomaly detection and treatment

  • Methods to detect point outliers and regime shifts, and how you would downweight, cap, or model them (e.g., interventions).

(c) Feature engineering

  • Calendar and event features (holidays, Prime Day, Black Friday/Cyber Monday), promotions/price indices, day-of-week/weekend effects, moving averages and lags, and any external drivers.

(d) Model choice: Unobserved Components Model (UCM)

  • Define the UCM structure (trend, seasonality, regressors), describe estimation via Kalman filtering and smoothing, and list key hyperparameters (e.g., state variances, seasonal complexity).

(e) Rolling-origin backtesting

  • Your scheme to pick training window lengths, forecast horizons, refit frequency, metrics (including for quantiles), and guardrails against leakage.

(f) UCM vs. SARIMA comparison

  • Contrast assumptions, interpretability, support for exogenous regressors, handling missing data, multi-seasonality, and computational cost.

(g) Scaling to hundreds of related series

  • How you would productionize and parallelize, share information across series, and when you would switch to alternative models (e.g., global probabilistic or deep-learning approaches).

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.