PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Amazon

Design end-to-end regression for energy demand

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to design and justify an end-to-end regression system for next-day site-level energy demand forecasting, testing competencies in time-series modeling, feature engineering, regularization, evaluation metrics and SLAs, production ML pipelines, monitoring, explainability, and data privacy.

  • hard
  • Amazon
  • Machine Learning
  • Data Scientist

Design end-to-end regression for energy demand

Company: Amazon

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

You must build an end-to-end regression system to predict daily site-level electricity consumption (kWh) for a portfolio of commercial buildings. Available data: hourly smart-meter reads; weather (temp, humidity, wind), calendar/holiday flags, building metadata (area, vintage, HVAC type), and optional external features (day-ahead price, outage alerts). Requirements: (1) Formulate the problem and baseline(s); (2) Engineer features to capture seasonality, interactions (e.g., temperature×HVAC), and occupancy proxies; (3) Choose and justify regularization, address multicollinearity, and detect/mitigate heteroscedasticity; (4) Use time-series-aware cross-validation and avoid leakage (be explicit about any lag/rolling constructs); (5) Specify metrics (RMSE, MAPE) and business-facing SLAs (e.g., billing tolerance bands); (6) Handle missing/corrupted sensors and concept drift across years (2019–2025); (7) Productionize: outline training/inference pipelines, model versioning, and monitoring (data/feature drift, residual distribution shifts, retraining triggers); (8) Explainability for non-ML stakeholders (global vs local) and safe failure modes; (9) Security/privacy constraints for tenant data. Finally, propose an ablation plan to quantify the incremental value of external features and describe how you would backtest the full pipeline on 2023–2024 while reserving 2025 for holdout evaluation.

Quick Answer: This question evaluates a candidate's ability to design and justify an end-to-end regression system for next-day site-level energy demand forecasting, testing competencies in time-series modeling, feature engineering, regularization, evaluation metrics and SLAs, production ML pipelines, monitoring, explainability, and data privacy.

Related Interview Questions

  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
  • Explain NLP/RL concepts used in LLM agents - Amazon (hard)
  • Design and evaluate a RAG system - Amazon (easy)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
3
0

End-to-End Daily Energy Prediction for Commercial Buildings

Context

You are asked to design and justify an end-to-end regression system that predicts next-day daily site-level electricity consumption (kWh) for a portfolio of commercial buildings across multiple years (2019–2025). The system should support forecasting for each building ("site") using:

  • Hourly smart-meter reads
  • Weather: temperature, humidity, wind
  • Calendar and holiday flags
  • Building metadata: floor area, vintage (year built), HVAC type
  • Optional external features: day-ahead electricity price, outage alerts

Assume you must deliver both accurate forecasts and a robust production pipeline suitable for enterprise operations.

Requirements

  1. Formulate the problem and propose baselines.
  2. Engineer features for seasonality, interactions (e.g., temperature × HVAC), and occupancy proxies.
  3. Choose and justify regularization; address multicollinearity; detect and mitigate heteroscedasticity.
  4. Use time-series-aware cross-validation and avoid leakage; be explicit about any lag/rolling constructs.
  5. Specify metrics (e.g., RMSE, MAPE) and business-facing SLAs (e.g., billing tolerance bands).
  6. Handle missing/corrupted sensors and concept drift across years (2019–2025).
  7. Productionize: outline training and inference pipelines, model versioning, and monitoring (data/feature drift, residual shifts, retraining triggers).
  8. Explainability for non-ML stakeholders (global vs. local) and safe failure modes.
  9. Security and privacy constraints for tenant data.
  10. Propose an ablation plan to quantify incremental value of external features and describe a backtest on 2023–2024 with 2025 held out for final evaluation.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.