Technical ML/Statistics Exercises (with precise math and small computations)
Assume a standard supervised learning setting with n samples, p features, design matrix X ∈ R^{n×p}, response y, and vectors are column vectors. Be precise with math and include small numeric computations where requested.
1) Ordinary Least Squares (OLS)
Derive OLS for linear regression from first principles:
-
Model and assumptions
-
Normal equations and closed‑form estimator
-
Conditions for existence of (XᵀX)^{-1}
-
Ridge (L2) solution
-
Bias–variance effects of OLS vs ridge
2) Logistic Regression
-
Write the negative log‑likelihood for binary labels.
-
Derive the gradient and Hessian.
-
Prove convexity of the objective.
-
Compute one explicit gradient step (no bias term) with learning rate η = 0.5 for a single example x = (1, 2), y = 1, current weights w = (0.1, −0.2).
3) Overfitting
-
List three distinct mitigation techniques (e.g., regularization, early stopping, data augmentation) and explain when each helps or hurts.
-
Propose a cross‑validation plan to tune λ for L2 regularization.
4) Bootstrapping vs Boosting
(a) Bootstrapping
-
Given sample values [2, 3, 5, 7, 11], describe the percentile‑interval procedure for a confidence interval of the mean.
-
Show the first two bootstrap resamples you would draw (with replacement) and compute their means.
-
Explain why the bootstrap can estimate uncertainty without parametric assumptions.
(b) Boosting (AdaBoost)
-
Explain the core idea (sequentially fitting to residuals or reweighted errors).
-
Perform one AdaBoost step with three training points having initial weights (1/3 each) where the weak learner misclassifies only the second point: compute ε, α = ½ ln((1−ε)/ε), the unnormalized updated weights, and the normalized distribution for the next round.
5) Compare Ensembles
Compare bagging, boosting, and random forests in terms of bias, variance, and robustness to noisy labels; provide one scenario where each is preferable.