Task: Snapshot Regression for Turbine-Level Power Prediction (Non–Time-Series)
You are given turbine-level SCADA snapshots and concurrent weather data. Build a non–time-series regression model that predicts instantaneous (e.g., 1–10 minute averaged) turbine power output using only features available at that same snapshot.
Assume data may include: wind speed and direction (from nacelle and/or met mast), air temperature, pressure, humidity, turbulence intensity (TI), turbine operational signals (e.g., rotor speed, pitch, yaw), turbine metadata (rated power, rotor diameter, hub height, model), and site metadata (elevation, terrain roughness). No sequence modeling is allowed.
Describe and justify the following:
-
Candidate Features and Preprocessing
-
Weather and turbine features, including derived physics-based features (e.g., air density, dynamic pressure, power-curve proxies).
-
Encoding of wind direction, yaw misalignment, and turbulence/shear.
-
Normalization/standardization choices and handling of categorical/site/turbine identifiers.
-
Handling Data Issues
-
Strategy for missing or noisy sensors; imputations and quality flags.
-
Outlier detection and treatment, including curtailment or abnormal operating modes.
-
Model Choices and Physics Encoding (no sequence models)
-
Compare: regularized linear models, gradient boosting, random forest, shallow MLP, GAMs.
-
How to encode known physics (e.g., approximate power curve, monotonicity to wind speed before rated, saturation at rated power) via features, constraints, or loss design.
-
Validation Strategy for Generalization
-
Cross-validation across sites/turbines and across wind-speed regimes (e.g., below cut-in, near rated, above rated) to ensure robustness and avoid leakage.
-
Evaluation Metrics and Error Structure
-
Metrics: RMSE, MAE, MAPE and their pitfalls; alternatives for low-power regimes.
-
Treatment of heteroscedastic errors and the cap at rated power.
-
Uncertainty Estimation and Calibration
-
Methods to produce and calibrate predictive intervals/uncertainty.
-
Safeguards and Edge Cases
-
Extrapolation detection and fallbacks.
-
Curtailment and availability scenarios: detect, model, or exclude.
Provide a structured, engineering-ready plan with formulas when relevant, and note key pitfalls and validation guardrails.