Two-Part Machine Learning Take-Home
Part 1 — Binary Classification: "Can Buy" vs "Cannot Buy"
Given applicant and market data, design a binary classifier to predict whether an applicant can buy a house (labels: can buy, cannot buy). Specify the following:
-
Data assumptions and label definition.
-
Preprocessing steps (missing values, outliers, leakage guards, encoding, scaling).
-
Feature design (engineered affordability features, credit/market features).
-
Model choice and rationale (including interpretability/calibration if applicable).
-
Validation strategy (splits, class imbalance handling, threshold selection).
-
Evaluation metrics and decision thresholding (including business-aligned metrics).
Part 2 — Regression: Wind-Farm Power Output
You are provided three files: train.csv, test.csv, sample_submission.csv. The target column is power output in train.csv. For each record in test.csv, predict power output and produce a submissions.csv with exactly two columns and a header: id, power output.
Describe your end-to-end approach:
-
Data checks and exploratory analysis (schema, leakage risks, target sanity checks).
-
Feature engineering (physics-aware, statistical, temporal, and interaction features).
-
Time-aware validation to avoid leakage (rolling/sliding windows; lag construction).
-
Model selection and tuning approach (baselines through advanced models).
-
Metrics (primary/secondary) and error analysis.
-
Training and inference pipeline, including how you would generate submissions.csv.