Build a non-time-series regression model to predict turbine-level wind power output for individual snapshots using available weather and turbine features. Describe: candidate features (e.g., wind speed/direction, air density, temperature, humidity, turbulence intensity, turbine specs) and preprocessing; handling missing or noisy sensors and outliers; model choices (regularized linear models, gradient boosting, random forest, shallow MLP) and how you would encode known physics (e.g., approximate power curve features) without sequence modeling; validation strategy across sites and wind-speed regimes to ensure generalization; evaluation metrics (RMSE, MAE, MAPE) and treatment of heteroscedastic errors and caps at rated power; methods for uncertainty estimation and calibration; and safeguards for extrapolation and curtailment scenarios.

This question evaluates competency in feature engineering, physics-informed regression modeling, sensor-data quality handling, model selection and validation, and uncertainty estimation for turbine-level instantaneous power prediction.

How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a hard difficulty ML System Design question, commonly asked during Take-home Project rounds at Citadel.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Citadel during technical interviews.

Build a regression model for wind power output

Task: Snapshot Regression for Turbine-Level Power Prediction (Non–Time-Series)

You are given turbine-level SCADA snapshots and concurrent weather data. Build a non–time-series regression model that predicts instantaneous (e.g., 1–10 minute averaged) turbine power output using only features available at that same snapshot.

Assume data may include: wind speed and direction (from nacelle and/or met mast), air temperature, pressure, humidity, turbulence intensity (TI), turbine operational signals (e.g., rotor speed, pitch, yaw), turbine metadata (rated power, rotor diameter, hub height, model), and site metadata (elevation, terrain roughness). No sequence modeling is allowed.

Describe and justify the following:

Candidate Features and Preprocessing

Weather and turbine features, including derived physics-based features (e.g., air density, dynamic pressure, power-curve proxies).
Encoding of wind direction, yaw misalignment, and turbulence/shear.
Normalization/standardization choices and handling of categorical/site/turbine identifiers.

Handling Data Issues

Strategy for missing or noisy sensors; imputations and quality flags.
Outlier detection and treatment, including curtailment or abnormal operating modes.

Model Choices and Physics Encoding (no sequence models)

Compare: regularized linear models, gradient boosting, random forest, shallow MLP, GAMs.
How to encode known physics (e.g., approximate power curve, monotonicity to wind speed before rated, saturation at rated power) via features, constraints, or loss design.

Validation Strategy for Generalization

Cross-validation across sites/turbines and across wind-speed regimes (e.g., below cut-in, near rated, above rated) to ensure robustness and avoid leakage.

Evaluation Metrics and Error Structure

Metrics: RMSE, MAE, MAPE and their pitfalls; alternatives for low-power regimes.
Treatment of heteroscedastic errors and the cap at rated power.

Uncertainty Estimation and Calibration

Methods to produce and calibrate predictive intervals/uncertainty.

Safeguards and Edge Cases

Extrapolation detection and fallbacks.
Curtailment and availability scenarios: detect, model, or exclude.

Provide a structured, engineering-ready plan with formulas when relevant, and note key pitfalls and validation guardrails.

Task: Snapshot Regression for Turbine-Level Power Prediction (Non–Time-Series)

Describe and justify the following:

Candidate Features and Preprocessing

Weather and turbine features, including derived physics-based features (e.g., air density, dynamic pressure, power-curve proxies).

Encoding of wind direction, yaw misalignment, and turbulence/shear.

Normalization/standardization choices and handling of categorical/site/turbine identifiers.

Handling Data Issues

Strategy for missing or noisy sensors; imputations and quality flags.

Outlier detection and treatment, including curtailment or abnormal operating modes.

Model Choices and Physics Encoding (no sequence models)

Compare: regularized linear models, gradient boosting, random forest, shallow MLP, GAMs.

How to encode known physics (e.g., approximate power curve, monotonicity to wind speed before rated, saturation at rated power) via features, constraints, or loss design.

Validation Strategy for Generalization

Cross-validation across sites/turbines and across wind-speed regimes (e.g., below cut-in, near rated, above rated) to ensure robustness and avoid leakage.

Evaluation Metrics and Error Structure

Metrics: RMSE, MAE, MAPE and their pitfalls; alternatives for low-power regimes.

Treatment of heteroscedastic errors and the cap at rated power.

Uncertainty Estimation and Calibration

Methods to produce and calibrate predictive intervals/uncertainty.

Safeguards and Edge Cases

Extrapolation detection and fallbacks.

Curtailment and availability scenarios: detect, model, or exclude.

Provide a structured, engineering-ready plan with formulas when relevant, and note key pitfalls and validation guardrails.

Build a regression model for wind power output

Quick Overview

Task: Snapshot Regression for Turbine-Level Power Prediction (Non–Time-Series)

Solution

Comments (0)

Build a regression model for wind power output

Quick Overview

Task: Snapshot Regression for Turbine-Level Power Prediction (Non–Time-Series)

Solution

Comments (0)