Design end-to-end regression for energy demand
Company: Amazon
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Onsite
You must build an end-to-end regression system to predict daily site-level electricity consumption (kWh) for a portfolio of commercial buildings. Available data: hourly smart-meter reads; weather (temp, humidity, wind), calendar/holiday flags, building metadata (area, vintage, HVAC type), and optional external features (day-ahead price, outage alerts). Requirements: (1) Formulate the problem and baseline(s); (2) Engineer features to capture seasonality, interactions (e.g., temperature×HVAC), and occupancy proxies; (3) Choose and justify regularization, address multicollinearity, and detect/mitigate heteroscedasticity; (4) Use time-series-aware cross-validation and avoid leakage (be explicit about any lag/rolling constructs); (5) Specify metrics (RMSE, MAPE) and business-facing SLAs (e.g., billing tolerance bands); (6) Handle missing/corrupted sensors and concept drift across years (2019–2025); (7) Productionize: outline training/inference pipelines, model versioning, and monitoring (data/feature drift, residual distribution shifts, retraining triggers); (8) Explainability for non-ML stakeholders (global vs local) and safe failure modes; (9) Security/privacy constraints for tenant data. Finally, propose an ablation plan to quantify the incremental value of external features and describe how you would backtest the full pipeline on 2023–2024 while reserving 2025 for holdout evaluation.
Quick Answer: This question evaluates a candidate's ability to design and justify an end-to-end regression system for next-day site-level energy demand forecasting, testing competencies in time-series modeling, feature engineering, regularization, evaluation metrics and SLAs, production ML pipelines, monitoring, explainability, and data privacy.