Amazon Machine Learning Engineer Machine Learning Interview Questions
Practice the exact questions companies are asking right now.
Explain key ML theory and techniques
Onsite Machine Learning Engineer: Mixed Topics You are asked to answer concisely but with depth across the following topics: 1) XGBoost Parallel Compu...
Explain core ML concepts and diagnostics
You are in an ML breadth interview for a Senior Applied Scientist role. Answer the following conceptual questions clearly and practically (definitions...
Test whether two user populations differ
Problem You are given two groups of users: - Group A: North America users - Group B: Europe users Each user has a vector of continuous features (e.g.,...
Explain ML evaluation, sequence models, and optimizers
Scenario An interviewer is deep-diving into an ML project you built (you can assume it is a supervised model unless specified otherwise). They want yo...
Implement SGD for linear regression and derive gradients
Prompt You are given a dataset of \(n\) 1D samples \(\{(x_i, y_i)\}_{i=1}^n\), where \(x_i\) and \(y_i\) are real numbers. We want to fit a linear mod...
Compare float types and design ablation
Floating-point types and ablation study design You are training deep neural networks on modern accelerators that support multiple floating-point forma...
Explain weight initialization methods and goals
Explain why weight initialization matters in deep neural networks. Then describe common initialization methods (such as random normal/uniform, Xavier/...
List hyperparameter tuning methods
Describe common methods for hyperparameter tuning in machine learning. For each method, explain: - How it works conceptually. - Its advantages and dis...
Contrast CNNs and fully connected networks
Compare convolutional neural networks (CNNs) with fully connected (dense) networks. Explain: - The structural differences between convolutional layers...
Analyze attention complexity and improvements
In the context of Transformer-style models, analyze the computational complexity of self-attention. Assume a sequence length of \(n\) and hidden dimen...
Compare decision trees and random forests
Compare decision trees and random forests. In your answer, discuss: - How a single decision tree is built and its main advantages and disadvantages. -...
Explain vanishing gradients and activations
Explain the vanishing gradient problem in deep neural networks. In your answer: - Describe how backpropagation works at a high level and why gradients...
Describe overfitting and L1/L2 regularization
Define overfitting in machine learning and explain why it is harmful. Then describe L1 and L2 regularization: - How each one modifies the loss functio...
Explain the bias–variance trade-off
Explain the bias–variance trade-off in supervised learning. In your answer, cover: - What bias and variance mean in the context of a prediction model....
Explain Transformers and MoE in LLMs
You are interviewing for a role working with large language models (LLMs). Explain the following concepts and how they relate to building and scaling ...
Explain surprisal and its units
You are discussing a language-modeling / NLP project. The interviewer asks about surprisal. 1. Define surprisal for an event/token with probability \(...
Explain core components of reinforcement learning
In reinforcement learning, we model an agent that interacts with an environment over time. The agent observes the state of the environment, takes acti...
Explain Layer Normalization in Transformers
Layer Normalization in Transformers: Placement, Gradients, and Practical Trade-offs Task Explain Layer Normalization (LayerNorm) as used in Transforme...
Explain Logistic Regression Fundamentals
Logistic Regression from First Principles Assumptions and Notation - Binary classification with labels y ∈ {0, 1} and features x ∈ R^d. - Linear score...
Explain XGBoost Parallelism Strategies
Explain How XGBoost Parallelizes Training Scope Describe how XGBoost achieves parallelism: 1. Within a single machine - Histogram-based split findi...