PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Statistics & Math/Amazon

Prove and apply statistical ML fundamentals

Last updated: Mar 29, 2026

Quick Overview

This question evaluates mastery of statistical machine-learning fundamentals—linear and logistic regression derivations, regularization and bias–variance trade-offs, resampling (bootstrap), boosting algorithms, and ensemble comparisons—using precise mathematical reasoning and small numeric computations.

  • hard
  • Amazon
  • Statistics & Math
  • Data Scientist

Prove and apply statistical ML fundamentals

Company: Amazon

Role: Data Scientist

Category: Statistics & Math

Difficulty: hard

Interview Round: Technical Screen

Work through these statistical ML exercises with precise math and small computations. 1) From first principles, derive ordinary least squares for linear regression: model, assumptions, normal equations, closed‑form estimator, conditions for (XᵀX)^{-1} existence, and the ridge solution; explain bias–variance effects. 2) Logistic regression: write the negative log‑likelihood for binary labels, derive gradient and Hessian, and prove convexity. Then compute one explicit gradient step (no bias term) with learning rate 0.5 for x=(1,2), y=1, current weights w=(0.1,−0.2). 3) Overfitting: list three distinct mitigation techniques (e.g., regularization, early stopping, data augmentation) and explain when each helps or hurts; propose a cross‑validation plan to tune λ for L2. 4) Bootstrapping vs boosting: a) Bootstrapping—given sample values [2,3,5,7,11], describe the percentile‑interval procedure for the mean; show the first two bootstrap resamples you would draw (with replacement) and compute their means; explain why the bootstrap can estimate uncertainty without parametric assumptions. b) Boosting—explain the core idea (sequentially fitting to residuals or reweighted errors). Perform one AdaBoost step with three training points having initial weights (1/3 each) where the weak learner misclassifies only the second point: compute ε, α=½ ln((1−ε)/ε), the unnormalized updated weights, and the normalized distribution for the next round. 5) Compare bagging, boosting, and random forests in terms of bias, variance, and robustness to noisy labels; provide one scenario where each is preferable.

Quick Answer: This question evaluates mastery of statistical machine-learning fundamentals—linear and logistic regression derivations, regularization and bias–variance trade-offs, resampling (bootstrap), boosting algorithms, and ensemble comparisons—using precise mathematical reasoning and small numeric computations.

Related Interview Questions

  • Compute an A/B test p-value by hand - Amazon (medium)
  • Compute and interpret quantile loss vs RMSE - Amazon (medium)
  • Compute CIs, power, and multiple testing - Amazon (medium)
  • Plan and analyze an A/B test - Amazon (hard)
  • Compute p-values, CIs, and adjust multiples - Amazon (Medium)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Statistics & Math
2
0

Technical ML/Statistics Exercises (with precise math and small computations)

Assume a standard supervised learning setting with n samples, p features, design matrix X ∈ R^{n×p}, response y, and vectors are column vectors. Be precise with math and include small numeric computations where requested.

1) Ordinary Least Squares (OLS)

Derive OLS for linear regression from first principles:

  • Model and assumptions
  • Normal equations and closed‑form estimator
  • Conditions for existence of (XᵀX)^{-1}
  • Ridge (L2) solution
  • Bias–variance effects of OLS vs ridge

2) Logistic Regression

  • Write the negative log‑likelihood for binary labels.
  • Derive the gradient and Hessian.
  • Prove convexity of the objective.
  • Compute one explicit gradient step (no bias term) with learning rate η = 0.5 for a single example x = (1, 2), y = 1, current weights w = (0.1, −0.2).

3) Overfitting

  • List three distinct mitigation techniques (e.g., regularization, early stopping, data augmentation) and explain when each helps or hurts.
  • Propose a cross‑validation plan to tune λ for L2 regularization.

4) Bootstrapping vs Boosting

(a) Bootstrapping

  • Given sample values [2, 3, 5, 7, 11], describe the percentile‑interval procedure for a confidence interval of the mean.
  • Show the first two bootstrap resamples you would draw (with replacement) and compute their means.
  • Explain why the bootstrap can estimate uncertainty without parametric assumptions.

(b) Boosting (AdaBoost)

  • Explain the core idea (sequentially fitting to residuals or reweighted errors).
  • Perform one AdaBoost step with three training points having initial weights (1/3 each) where the weak learner misclassifies only the second point: compute ε, α = ½ ln((1−ε)/ε), the unnormalized updated weights, and the normalized distribution for the next round.

5) Compare Ensembles

Compare bagging, boosting, and random forests in terms of bias, variance, and robustness to noisy labels; provide one scenario where each is preferable.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Statistics & Math•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Statistics & Math•Data Scientist Statistics & Math
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.