You are asked several machine learning fundamentals questions:
-
You are building a
binary classifier with a highly imbalanced target
. How would you handle the imbalance during training, and how would you evaluate the model?
-
The full dataset is
too large to train on directly
, so you train using a sample. How would you verify that the sample is representative of the full dataset and that the resulting model generalizes well to the full population?
-
You are using a
tree-based model
. How would you prevent overfitting?
-
Why are
L1- and L2-regularized estimators biased
, and why can they still outperform an unbiased estimator on out-of-sample prediction?