How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

What difficulty level is this interview question?

This is a hard difficulty Statistics & Math question, commonly asked during Technical Screen rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Amazon during technical interviews.

Prove and apply statistical ML fundamentals

Quick Overview

This question evaluates mastery of statistical machine-learning fundamentals—linear and logistic regression derivations, regularization and bias–variance trade-offs, resampling (bootstrap), boosting algorithms, and ensemble comparisons—using precise mathematical reasoning and small numeric computations.

Technical ML/Statistics Exercises (with precise math and small computations)

Assume a standard supervised learning setting with n samples, p features, design matrix X ∈ R^{n×p}, response y, and vectors are column vectors. Be precise with math and include small numeric computations where requested.

1) Ordinary Least Squares (OLS)

Derive OLS for linear regression from first principles:

Model and assumptions
Normal equations and closed‑form estimator
Conditions for existence of (XᵀX)^{-1}
Ridge (L2) solution
Bias–variance effects of OLS vs ridge

2) Logistic Regression

Write the negative log‑likelihood for binary labels.
Derive the gradient and Hessian.
Prove convexity of the objective.
Compute one explicit gradient step (no bias term) with learning rate η = 0.5 for a single example x = (1, 2), y = 1, current weights w = (0.1, −0.2).

3) Overfitting

List three distinct mitigation techniques (e.g., regularization, early stopping, data augmentation) and explain when each helps or hurts.
Propose a cross‑validation plan to tune λ for L2 regularization.

4) Bootstrapping vs Boosting

(a) Bootstrapping

Given sample values [2, 3, 5, 7, 11], describe the percentile‑interval procedure for a confidence interval of the mean.
Show the first two bootstrap resamples you would draw (with replacement) and compute their means.
Explain why the bootstrap can estimate uncertainty without parametric assumptions.

(b) Boosting (AdaBoost)

Explain the core idea (sequentially fitting to residuals or reweighted errors).
Perform one AdaBoost step with three training points having initial weights (1/3 each) where the weak learner misclassifies only the second point: compute ε, α = ½ ln((1−ε)/ε), the unnormalized updated weights, and the normalized distribution for the next round.

5) Compare Ensembles

Compare bagging, boosting, and random forests in terms of bias, variance, and robustness to noisy labels; provide one scenario where each is preferable.

Quick Overview

4) Bootstrapping vs Boosting

(a) Bootstrapping

Given sample values [2, 3, 5, 7, 11], describe the percentile‑interval procedure for a confidence interval of the mean.

Show the first two bootstrap resamples you would draw (with replacement) and compute their means.

Explain why the bootstrap can estimate uncertainty without parametric assumptions.

(b) Boosting (AdaBoost)

Explain the core idea (sequentially fitting to residuals or reweighted errors).

Perform one AdaBoost step with three training points having initial weights (1/3 each) where the weak learner misclassifies only the second point: compute ε, α = ½ ln((1−ε)/ε), the unnormalized updated weights, and the normalized distribution for the next round.

Prove and apply statistical ML fundamentals

Quick Overview

Technical ML/Statistics Exercises (with precise math and small computations)

1) Ordinary Least Squares (OLS)

2) Logistic Regression

3) Overfitting

4) Bootstrapping vs Boosting

5) Compare Ensembles

Solution

Comments (0)

Prove and apply statistical ML fundamentals

Quick Overview

Technical ML/Statistics Exercises (with precise math and small computations)

1) Ordinary Least Squares (OLS)

2) Logistic Regression

3) Overfitting

4) Bootstrapping vs Boosting

5) Compare Ensembles

Solution

Comments (0)