PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Amazon

Explain Core ML Interview Concepts

Last updated: May 12, 2026

Quick Overview

This question evaluates core machine learning fundamentals including statistical modeling assumptions and loss functions (linear and logistic regression), ensemble methods and feature sampling in random forests, optimization algorithms (Adam versus stochastic gradient descent), and neural network capacity and training dynamics.

  • hard
  • Amazon
  • Machine Learning
  • Machine Learning Engineer

Explain Core ML Interview Concepts

Company: Amazon

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

You are in a phone screen for an applied scientist role and are asked to verbally explain a set of machine learning fundamentals. For each part, give a precise, conceptually correct answer and be ready to justify *why*, not just *what*. Treat each question as an invitation to demonstrate depth: state the core idea, then explain the reasoning or intuition behind it. ### Constraints & Assumptions - This is a conceptual / whiteboard-style discussion, not a coding exercise. No data, libraries, or runnable code are provided. - Answers are expected to be verbal explanations with light math notation where helpful (e.g. loss functions, update rules). - Assume standard supervised-learning settings unless a part specifies otherwise. - Depth and correctness of reasoning matter more than breadth; the interviewer probes the "why" behind each answer. ### Clarifying Questions to Ask - For the regression and classification parts, should I focus on the modeling assumptions, the estimation/optimization view, or both? - When discussing loss functions, do you want the probabilistic (maximum-likelihood) justification or just the optimization properties? - For the optimizer comparison, are you interested in a specific regime (e.g. large-scale vision, NLP, sparse features), or a general comparison? - For the neural-network part, are we reasoning about classical small-network intuition or modern overparameterized deep-learning theory? ### What a Strong Answer Covers The interviewer is listening for these signals across the five parts (this is a checklist of *dimensions*, not the answers): - **Assumptions stated explicitly** for linear and logistic regression, and awareness of which ones matter for point estimates vs. inference. - **Probabilistic grounding**: connecting squared loss and log-loss to maximum likelihood under specific noise/label models. - **Mechanism of randomness** in ensembles and *why* it helps (variance reduction / decorrelation), not just "it's a bunch of trees." - **Optimizer internals**: what state Adam maintains, the update rule, and honest trade-offs vs. SGD (memory, generalization, tuning). - **Non-convex optimization intuition** for narrow vs. wide networks, including capacity, local minima, and overfitting risk. - **Calibrated nuance**: acknowledging where the textbook answer is incomplete or where practice diverges from theory. --- ### Part 1 — Linear Regression What are the main assumptions of linear regression? Why is squared loss commonly used? ```hint Where to start List the classical assumptions one at a time (think: form of the model, the error term's mean, correlation/independence of errors, error variance, relationships among features). Then separate "needed for unbiased point estimates" from "needed for valid inference." ``` ```hint Why squared loss Consider what probabilistic noise model makes least-squares the **maximum-likelihood** estimator. Also think about convexity, differentiability, and which statistic of $y$ squared loss ends up estimating. ``` ### Part 2 — Logistic Regression What is logistic regression? Why do logarithms appear in its formulation or loss function? ```hint Where the log enters The log isn't there by accident — it shows up in more than one place once you write the model out. Trace the path from a raw probability in $(0,1)$ to the linear score, and separately think about how the model's parameters are actually fit. Ask what each step would look like *without* a log and why that breaks. ``` ```hint The loss For Bernoulli labels, maximum likelihood is a *product* of probabilities. What does taking a $\log$ do to a product, and why is that helpful both mathematically and numerically? ``` ### Part 3 — Random Forest What is a random forest? During tree construction, how is the set of candidate features selected? ```hint Two sources of randomness A random forest injects randomness in two ways: how the *data* for each tree is drawn, and how *features* are considered at each split. Name both. ``` ```hint Feature selection at a split Consider whether each split gets to look at *all* features or only a restricted set of them, and what a tuning knob controlling that count would be. Then push on *why* deliberately hiding features from a split could make the overall ensemble better rather than worse. ``` ### Part 4 — Adam vs. SGD Explain the Adam optimizer. What are its advantages and disadvantages compared with vanilla stochastic gradient descent? ```hint What state Adam keeps Adam combines two ideas you've likely seen in other optimizers, and it does so by keeping per-parameter running statistics of the gradient stream. What two quantities about the recent gradients would each idea want to track, and how would the update use them together? Once you've named them, write the moving-average updates and the final parameter update. ``` ```hint Trade-offs to weigh Be honest about both sides: faster early convergence and per-parameter adaptive rates vs. extra memory and the well-documented generalization concerns relative to well-tuned SGD with momentum. Mention how weight decay interacts with Adam. ``` ### Part 5 — Narrow vs. Wide Networks and Local Minima Consider two neural networks with the same two-layer structure. One has only a few neurons per layer, while the other has many neurons per layer. Which one is more likely to get trapped in a poor local minimum, and why? ```hint Frame it as capacity Both objectives are non-convex. Think about how the number of parameters affects the *number of low-loss configurations* and how "connected" the good solutions are in the loss landscape. ``` ```hint Don't forget the trade-off A complete answer names which network is more prone to poor local minima / underfitting, but also flags the *cost* of the easier-to-optimize one (what does extra capacity risk if data or regularization is limited?). ``` --- ### Follow-up Questions - For squared loss: how would your answer change if the noise were heavy-tailed (e.g. Laplacian) instead of Gaussian — what loss would maximum likelihood give you then? - For random forests: how do `n_estimators` and the feature-subset size $m_{try}$ trade off bias, variance, and decorrelation between trees? - For Adam: in what concrete settings have you seen (or would you expect) SGD with momentum to generalize better, and what would you try to close the gap? - For the narrow-vs-wide question: how does the modern overparameterization view (loss-landscape connectivity, flat minima) reconcile with classical "more parameters → more overfitting" intuition?

Quick Answer: This question evaluates core machine learning fundamentals including statistical modeling assumptions and loss functions (linear and logistic regression), ensemble methods and feature sampling in random forests, optimization algorithms (Adam versus stochastic gradient descent), and neural network capacity and training dynamics.

Related Interview Questions

  • Explain Transformer and MoE Fundamentals - Amazon (medium)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
  • Explain NLP/RL concepts used in LLM agents - Amazon (hard)
  • Design and evaluate a RAG system - Amazon (easy)
Amazon logo
Amazon
Apr 27, 2026, 12:00 AM
Machine Learning Engineer
Technical Screen
Machine Learning
165
0

You are in a phone screen for an applied scientist role and are asked to verbally explain a set of machine learning fundamentals. For each part, give a precise, conceptually correct answer and be ready to justify why, not just what. Treat each question as an invitation to demonstrate depth: state the core idea, then explain the reasoning or intuition behind it.

Constraints & Assumptions

  • This is a conceptual / whiteboard-style discussion, not a coding exercise. No data, libraries, or runnable code are provided.
  • Answers are expected to be verbal explanations with light math notation where helpful (e.g. loss functions, update rules).
  • Assume standard supervised-learning settings unless a part specifies otherwise.
  • Depth and correctness of reasoning matter more than breadth; the interviewer probes the "why" behind each answer.

Clarifying Questions to Ask

  • For the regression and classification parts, should I focus on the modeling assumptions, the estimation/optimization view, or both?
  • When discussing loss functions, do you want the probabilistic (maximum-likelihood) justification or just the optimization properties?
  • For the optimizer comparison, are you interested in a specific regime (e.g. large-scale vision, NLP, sparse features), or a general comparison?
  • For the neural-network part, are we reasoning about classical small-network intuition or modern overparameterized deep-learning theory?

What a Strong Answer Covers

The interviewer is listening for these signals across the five parts (this is a checklist of dimensions, not the answers):

  • Assumptions stated explicitly for linear and logistic regression, and awareness of which ones matter for point estimates vs. inference.
  • Probabilistic grounding : connecting squared loss and log-loss to maximum likelihood under specific noise/label models.
  • Mechanism of randomness in ensembles and why it helps (variance reduction / decorrelation), not just "it's a bunch of trees."
  • Optimizer internals : what state Adam maintains, the update rule, and honest trade-offs vs. SGD (memory, generalization, tuning).
  • Non-convex optimization intuition for narrow vs. wide networks, including capacity, local minima, and overfitting risk.
  • Calibrated nuance : acknowledging where the textbook answer is incomplete or where practice diverges from theory.

Part 1 — Linear Regression

What are the main assumptions of linear regression? Why is squared loss commonly used?

Part 2 — Logistic Regression

What is logistic regression? Why do logarithms appear in its formulation or loss function?

Part 3 — Random Forest

What is a random forest? During tree construction, how is the set of candidate features selected?

Part 4 — Adam vs. SGD

Explain the Adam optimizer. What are its advantages and disadvantages compared with vanilla stochastic gradient descent?

Part 5 — Narrow vs. Wide Networks and Local Minima

Consider two neural networks with the same two-layer structure. One has only a few neurons per layer, while the other has many neurons per layer. Which one is more likely to get trapped in a poor local minimum, and why?

Follow-up Questions

  • For squared loss: how would your answer change if the noise were heavy-tailed (e.g. Laplacian) instead of Gaussian — what loss would maximum likelihood give you then?
  • For random forests: how do n_estimators and the feature-subset size mtrym_{try}mtry​ trade off bias, variance, and decorrelation between trees?
  • For Adam: in what concrete settings have you seen (or would you expect) SGD with momentum to generalize better, and what would you try to close the gap?
  • For the narrow-vs-wide question: how does the modern overparameterization view (loss-landscape connectivity, flat minima) reconcile with classical "more parameters → more overfitting" intuition?

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Machine Learning Engineer•Amazon Machine Learning Engineer•Amazon Machine Learning•Machine Learning Engineer Machine Learning
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.