This question evaluates a data scientist's proficiency in machine learning topics including handling class imbalance, selecting and interpreting evaluation metrics, verifying sample representativeness, preventing overfitting in tree-based models, and understanding why L1/L2 regularization introduces biased coefficient estimates.
Answer the following ML engineering/data science questions.
You’re training a classifier where the positive class is rare.
You train a model on a sample drawn from a massive dataset.
For decision trees / random forests / gradient-boosted trees:
Explain why L1 (Lasso) and L2 (Ridge) regularization typically produce biased coefficient estimates, and why we still use them.