Handle imbalance, sampling, and overfitting
Company: LinkedIn
Role: Data Scientist
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
You are asked several machine learning fundamentals questions:
1. You are building a **binary classifier with a highly imbalanced target**. How would you handle the imbalance during training, and how would you evaluate the model?
2. The full dataset is **too large to train on directly**, so you train using a sample. How would you verify that the sample is representative of the full dataset and that the resulting model generalizes well to the full population?
3. You are using a **tree-based model**. How would you prevent overfitting?
4. Why are **L1- and L2-regularized estimators biased**, and why can they still outperform an unbiased estimator on out-of-sample prediction?
Quick Answer: This question evaluates knowledge of handling class imbalance, verifying sample representativeness and generalization, preventing overfitting in tree-based models, and understanding the bias implications of L1/L2 regularization.