Scenario
Onsite machine-learning exercise: your task is to build a regression model using only numerical features that not only fits training data but also keeps low error when test points fall outside the feature ranges seen during training (i.e., extrapolation).
Task
-
Design and implement a regression solution that extrapolates robustly beyond the training feature range.
-
Provide code for:
-
Data splitting that explicitly creates an out-of-range (OOR) test subset.
-
A training pipeline with feature engineering, model choice, and regularization.
-
An evaluation protocol that reports performance in-range vs. out-of-range.
-
Explain your design decisions: feature engineering, model selection, regularization, and extrapolation evaluation methodology.
Assumptions
-
You are given a tabular dataset with numerical features X (shape: n_samples × n_features) and a continuous target y.
-
If no dataset is provided, you may demonstrate with a synthetic dataset and keep the same code path.
Requirements
-
Use models that can extrapolate (e.g., linear models, low-degree polynomial bases with regularization, or spline bases with linear extrapolation).
-
Standardize features appropriately.
-
Regularize to control coefficient growth outside the training range.
-
Hold out a test split drawn from an expanded feature range and report separate metrics for in-range (IR) and out-of-range (OOR) points.
Hints
-
Consider linear or monotonic models, polynomial basis with regularization, data standardization, and a hold-out test split drawn from an expanded feature range.
-
Tree ensembles without additional structure typically do not extrapolate.