How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a hard difficulty Machine Learning question, commonly asked during Technical Screen rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Amazon during technical interviews.

Explain Core ML Interview Concepts | Amazon Interview Question

Q: Explain Core ML Interview Concepts

This question evaluates core machine learning fundamentals including statistical modeling assumptions and loss functions (linear and logistic regression), ensemble methods and feature sampling in random forests, optimization algorithms (Adam versus stochastic gradient descent), and neural network capacity and training dynamics.

You are in a phone screen for an applied scientist role and are asked to verbally explain a set of machine learning fundamentals. For each part, give a precise, conceptually correct answer and be ready to justify why, not just what. Treat each question as an invitation to demonstrate depth: state the core idea, then explain the reasoning or intuition behind it.

Constraints & Assumptions

This is a conceptual / whiteboard-style discussion, not a coding exercise. No data, libraries, or runnable code are provided.
Answers are expected to be verbal explanations with light math notation where helpful (e.g. loss functions, update rules).
Assume standard supervised-learning settings unless a part specifies otherwise.
Depth and correctness of reasoning matter more than breadth; the interviewer probes the "why" behind each answer.

Clarifying Questions to Ask

For the regression and classification parts, should I focus on the modeling assumptions, the estimation/optimization view, or both?
When discussing loss functions, do you want the probabilistic (maximum-likelihood) justification or just the optimization properties?
For the optimizer comparison, are you interested in a specific regime (e.g. large-scale vision, NLP, sparse features), or a general comparison?
For the neural-network part, are we reasoning about classical small-network intuition or modern overparameterized deep-learning theory?

What a Strong Answer Covers

The interviewer is listening for these signals across the five parts (this is a checklist of dimensions, not the answers):

Assumptions stated explicitly for linear and logistic regression, and awareness of which ones matter for point estimates vs. inference.
Probabilistic grounding : connecting squared loss and log-loss to maximum likelihood under specific noise/label models.
Mechanism of randomness in ensembles and why it helps (variance reduction / decorrelation), not just "it's a bunch of trees."
Optimizer internals : what state Adam maintains, the update rule, and honest trade-offs vs. SGD (memory, generalization, tuning).
Non-convex optimization intuition for narrow vs. wide networks, including capacity, local minima, and overfitting risk.
Calibrated nuance : acknowledging where the textbook answer is incomplete or where practice diverges from theory.

Part 1 — Linear Regression

What are the main assumptions of linear regression? Why is squared loss commonly used?

Part 2 — Logistic Regression

What is logistic regression? Why do logarithms appear in its formulation or loss function?

Part 3 — Random Forest

What is a random forest? During tree construction, how is the set of candidate features selected?

Part 4 — Adam vs. SGD

Explain the Adam optimizer. What are its advantages and disadvantages compared with vanilla stochastic gradient descent?

Part 5 — Narrow vs. Wide Networks and Local Minima

Consider two neural networks with the same two-layer structure. One has only a few neurons per layer, while the other has many neurons per layer. Which one is more likely to get trapped in a poor local minimum, and why?

Follow-up Questions

For squared loss: how would your answer change if the noise were heavy-tailed (e.g. Laplacian) instead of Gaussian — what loss would maximum likelihood give you then?
For random forests: how do n_estimators and the feature-subset size $m_{try}$ trade off bias, variance, and decorrelation between trees?
For Adam: in what concrete settings have you seen (or would you expect) SGD with momentum to generalize better, and what would you try to close the gap?
For the narrow-vs-wide question: how does the modern overparameterization view (loss-landscape connectivity, flat minima) reconcile with classical "more parameters → more overfitting" intuition?

Constraints & Assumptions

This is a conceptual / whiteboard-style discussion, not a coding exercise. No data, libraries, or runnable code are provided.
Answers are expected to be verbal explanations with light math notation where helpful (e.g. loss functions, update rules).
Assume standard supervised-learning settings unless a part specifies otherwise.
Depth and correctness of reasoning matter more than breadth; the interviewer probes the "why" behind each answer.

Clarifying Questions to Ask

For the regression and classification parts, should I focus on the modeling assumptions, the estimation/optimization view, or both?
When discussing loss functions, do you want the probabilistic (maximum-likelihood) justification or just the optimization properties?
For the optimizer comparison, are you interested in a specific regime (e.g. large-scale vision, NLP, sparse features), or a general comparison?
For the neural-network part, are we reasoning about classical small-network intuition or modern overparameterized deep-learning theory?

What a Strong Answer Covers

The interviewer is listening for these signals across the five parts (this is a checklist of dimensions, not the answers):

Assumptions stated explicitly for linear and logistic regression, and awareness of which ones matter for point estimates vs. inference.
Probabilistic grounding : connecting squared loss and log-loss to maximum likelihood under specific noise/label models.
Mechanism of randomness in ensembles and why it helps (variance reduction / decorrelation), not just "it's a bunch of trees."
Optimizer internals : what state Adam maintains, the update rule, and honest trade-offs vs. SGD (memory, generalization, tuning).
Non-convex optimization intuition for narrow vs. wide networks, including capacity, local minima, and overfitting risk.
Calibrated nuance : acknowledging where the textbook answer is incomplete or where practice diverges from theory.

Part 1 — Linear Regression

What are the main assumptions of linear regression? Why is squared loss commonly used?

Part 2 — Logistic Regression

What is logistic regression? Why do logarithms appear in its formulation or loss function?

Part 3 — Random Forest

What is a random forest? During tree construction, how is the set of candidate features selected?

Part 4 — Adam vs. SGD

Explain the Adam optimizer. What are its advantages and disadvantages compared with vanilla stochastic gradient descent?

Part 5 — Narrow vs. Wide Networks and Local Minima

Follow-up Questions

For squared loss: how would your answer change if the noise were heavy-tailed (e.g. Laplacian) instead of Gaussian — what loss would maximum likelihood give you then?
For random forests: how do n_estimators and the feature-subset size $m_{try}$ trade off bias, variance, and decorrelation between trees?
For Adam: in what concrete settings have you seen (or would you expect) SGD with momentum to generalize better, and what would you try to close the gap?
For the narrow-vs-wide question: how does the modern overparameterization view (loss-landscape connectivity, flat minima) reconcile with classical "more parameters → more overfitting" intuition?

Explain Core ML Interview Concepts

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Part 1 — Linear Regression

Part 2 — Logistic Regression

Part 3 — Random Forest

Part 4 — Adam vs. SGD

Part 5 — Narrow vs. Wide Networks and Local Minima

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Explain Core ML Interview Concepts

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Part 1 — Linear Regression

Part 2 — Logistic Regression

Part 3 — Random Forest

Part 4 — Adam vs. SGD

Part 5 — Narrow vs. Wide Networks and Local Minima

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP