Model Review: Month Encoding, Feature Scaling, and Imbalanced Data
Context
You are auditing an existing predictive model for operational performance. The current implementation encodes calendar month as a continuous variable and is trained on imbalanced data. Address the following:
Questions
-
Month as a continuous variable
-
What problems can arise if the model treats calendar month (e.g., Jan=1, ..., Dec=12) as a continuous feature?
-
How would you fix this encoding to capture seasonality correctly?
-
Feature standardization
-
Why is standardizing predictors important before fitting certain models?
-
What can go wrong if you skip standardization?
-
Imbalanced data and recall
-
Your training data are highly imbalanced. Describe two concrete ways to adjust the loss function or the evaluation/thresholding so recall is properly rewarded.