Machine Learning Fundamentals: Regularization, Losses, PCA, and Random Forests
Assume standard supervised learning with linear models for regression/classification, PCA for dimensionality reduction, and Random Forests for tabular data. Answer the following:
1) L1 vs. L2 Regularization
Compare L1 (Lasso) and L2 (Ridge) regularization in terms of:
-
Sparsity of learned coefficients
-
Optimization geometry and differentiability
-
Robustness to outliers (clarify what kind of outliers and how the penalty interacts with the loss)
2) Choosing Loss Functions and Gradient Properties
Explain how to choose loss functions for:
-
Regression: MSE vs. MAE (and mention Huber if relevant)
-
Classification: logistic/cross-entropy (and note hinge/focal if relevant)
Discuss their gradient properties, optimization behavior, and sensitivity to outliers.
3) PCA
Describe PCA’s objective (variance maximization vs. reconstruction error minimization), the fitting and transform steps, and how to select the number of components.
4) Random Forests
Explain how Random Forests are trained, their bias–variance trade-off, the limits of impurity-based feature importance, and key hyperparameters (with brief tuning guidance).