PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/Citadel

Build a regression model for wind power output

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competency in feature engineering, physics-informed regression modeling, sensor-data quality handling, model selection and validation, and uncertainty estimation for turbine-level instantaneous power prediction.

  • hard
  • Citadel
  • ML System Design
  • Data Scientist

Build a regression model for wind power output

Company: Citadel

Role: Data Scientist

Category: ML System Design

Difficulty: hard

Interview Round: Take-home Project

Build a non-time-series regression model to predict turbine-level wind power output for individual snapshots using available weather and turbine features. Describe: candidate features (e.g., wind speed/direction, air density, temperature, humidity, turbulence intensity, turbine specs) and preprocessing; handling missing or noisy sensors and outliers; model choices (regularized linear models, gradient boosting, random forest, shallow MLP) and how you would encode known physics (e.g., approximate power curve features) without sequence modeling; validation strategy across sites and wind-speed regimes to ensure generalization; evaluation metrics (RMSE, MAE, MAPE) and treatment of heteroscedastic errors and caps at rated power; methods for uncertainty estimation and calibration; and safeguards for extrapolation and curtailment scenarios.

Quick Answer: This question evaluates competency in feature engineering, physics-informed regression modeling, sensor-data quality handling, model selection and validation, and uncertainty estimation for turbine-level instantaneous power prediction.

Related Interview Questions

  • Stabilize LLM inference and estimate needed repeats - Citadel (medium)
  • Build models for housing and wind power prediction - Citadel (hard)
  • Design a time-series home-buy decision classifier - Citadel (hard)
Citadel logo
Citadel
Aug 13, 2025, 12:00 AM
Data Scientist
Take-home Project
ML System Design
3
0

Task: Snapshot Regression for Turbine-Level Power Prediction (Non–Time-Series)

You are given turbine-level SCADA snapshots and concurrent weather data. Build a non–time-series regression model that predicts instantaneous (e.g., 1–10 minute averaged) turbine power output using only features available at that same snapshot.

Assume data may include: wind speed and direction (from nacelle and/or met mast), air temperature, pressure, humidity, turbulence intensity (TI), turbine operational signals (e.g., rotor speed, pitch, yaw), turbine metadata (rated power, rotor diameter, hub height, model), and site metadata (elevation, terrain roughness). No sequence modeling is allowed.

Describe and justify the following:

  1. Candidate Features and Preprocessing
  • Weather and turbine features, including derived physics-based features (e.g., air density, dynamic pressure, power-curve proxies).
  • Encoding of wind direction, yaw misalignment, and turbulence/shear.
  • Normalization/standardization choices and handling of categorical/site/turbine identifiers.
  1. Handling Data Issues
  • Strategy for missing or noisy sensors; imputations and quality flags.
  • Outlier detection and treatment, including curtailment or abnormal operating modes.
  1. Model Choices and Physics Encoding (no sequence models)
  • Compare: regularized linear models, gradient boosting, random forest, shallow MLP, GAMs.
  • How to encode known physics (e.g., approximate power curve, monotonicity to wind speed before rated, saturation at rated power) via features, constraints, or loss design.
  1. Validation Strategy for Generalization
  • Cross-validation across sites/turbines and across wind-speed regimes (e.g., below cut-in, near rated, above rated) to ensure robustness and avoid leakage.
  1. Evaluation Metrics and Error Structure
  • Metrics: RMSE, MAE, MAPE and their pitfalls; alternatives for low-power regimes.
  • Treatment of heteroscedastic errors and the cap at rated power.
  1. Uncertainty Estimation and Calibration
  • Methods to produce and calibrate predictive intervals/uncertainty.
  1. Safeguards and Edge Cases
  • Extrapolation detection and fallbacks.
  • Curtailment and availability scenarios: detect, model, or exclude.

Provide a structured, engineering-ready plan with formulas when relevant, and note key pitfalls and validation guardrails.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Citadel•More Data Scientist•Citadel Data Scientist•Citadel ML System Design•Data Scientist ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.