Prove and apply statistical ML fundamentals

Q: Prove and apply statistical ML fundamentals

This is a Statistics & Math interview question from Amazon for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

Question

Technical ML/Statistics Exercises (with precise math and small computations)

Assume a standard supervised learning setting with n samples, p features, design matrix X ∈ R^{n×p}, response y, and vectors are column vectors. Be precise with math and include small numeric computations where requested.

1) Ordinary Least Squares (OLS)

Derive OLS for linear regression from first principles:

Model and assumptions
Normal equations and closed‑form estimator
Conditions for existence of (XᵀX)^{-1}
Ridge (L2) solution
Bias–variance effects of OLS vs ridge

2) Logistic Regression

Write the negative log‑likelihood for binary labels.
Derive the gradient and Hessian.
Prove convexity of the objective.
Compute one explicit gradient step (no bias term) with learning rate η = 0.5 for a single example x = (1, 2), y = 1, current weights w = (0.1, −0.2).

3) Overfitting

List three distinct mitigation techniques (e.g., regularization, early stopping, data augmentation) and explain when each helps or hurts.
Propose a cross‑validation plan to tune λ for L2 regularization.

4) Bootstrapping vs Boosting

(a) Bootstrapping

Given sample values [2, 3, 5, 7, 11], describe the percentile‑interval procedure for a confidence interval of the mean.
Show the first two bootstrap resamples you would draw (with replacement) and compute their means.
Explain why the bootstrap can estimate uncertainty without parametric assumptions.

(b) Boosting (AdaBoost)

Explain the core idea (sequentially fitting to residuals or reweighted errors).
Perform one AdaBoost step with three training points having initial weights (1/3 each) where the weak learner misclassifies only the second point: compute ε, α = ½ ln((1−ε)/ε), the unnormalized updated weights, and the normalized distribution for the next round.

5) Compare Ensembles

Compare bagging, boosting, and random forests in terms of bias, variance, and robustness to noisy labels; provide one scenario where each is preferable.