PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Amazon

Explain ML evaluation, sequence models, and optimizers

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in machine learning model evaluation and metrics, sequence modeling trade-offs between transformers and RNNs, image distributional shift detection, and comparative understanding of optimization algorithms within the Machine Learning domain.

  • medium
  • Amazon
  • Machine Learning
  • Machine Learning Engineer

Explain ML evaluation, sequence models, and optimizers

Company: Amazon

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: medium

Interview Round: Onsite

## Scenario An interviewer is deep-diving into an ML project you built (you can assume it is a supervised model unless specified otherwise). They want you to justify model choices, evaluation, and training decisions. ## Part A — Evaluation design and metrics 1. Describe **how you evaluate** your model end-to-end (data split strategy, validation protocol, test usage). 2. Which **metrics** do you use and **why** (business/ML tradeoffs)? 3. Provide the **mathematical definitions** for the metrics you mention (e.g., accuracy, precision/recall, F1, ROC-AUC, PR-AUC, log loss, MSE/MAE, calibration metrics). 4. Propose a **better evaluation workflow** than a single holdout set (e.g., cross-validation, time-based split, stratification, repeated runs, confidence intervals). 5. If you can add **human labels** (or human evaluation), explain: - what you would label, - how you would ensure quality (guidelines, inter-annotator agreement), - how it improves the evaluation signal. 6. If you have **no labels**, what is the **simplest** way to estimate whether two model outputs/answers are **similar**? ## Part B — Transformers vs RNNs on long inputs 1. Compare **Transformer** and **RNN/LSTM/GRU** architectures. 2. For **very long sequences**, discuss the pros/cons of each (training stability, ability to capture long-range dependencies, compute/memory). 3. Explain why **attention** can capture long-range dependencies, and why vanilla RNNs often struggle. ## Part C — Detecting distribution mismatch in images You have two sets of images (Set A and Set B). How would you test whether they appear to come from the **same underlying distribution**? ## Part D — Optimizers Compare the practical differences and tradeoffs among **SGD (with/without momentum)**, **RMSProp**, **Adam**, and **AdamW**. When would AdamW be preferable?

Quick Answer: This question evaluates a candidate's competency in machine learning model evaluation and metrics, sequence modeling trade-offs between transformers and RNNs, image distributional shift detection, and comparative understanding of optimization algorithms within the Machine Learning domain.

Related Interview Questions

  • LLM Fundamentals: Tokenization Design and KL-Regularized SFT - Amazon (medium)
  • Predicting the Next Elevator Call Location - Amazon (medium)
  • Explain Transformer and MoE Fundamentals - Amazon (medium)
  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
|Home/Machine Learning/Amazon

Explain ML evaluation, sequence models, and optimizers

Amazon logo
Amazon
Dec 15, 2025, 12:00 AM
mediumMachine Learning EngineerOnsiteMachine Learning
3
0

Scenario

An interviewer is deep-diving into an ML project you built (you can assume it is a supervised model unless specified otherwise). They want you to justify model choices, evaluation, and training decisions.

Part A — Evaluation design and metrics

  1. Describe how you evaluate your model end-to-end (data split strategy, validation protocol, test usage).
  2. Which metrics do you use and why (business/ML tradeoffs)?
  3. Provide the mathematical definitions for the metrics you mention (e.g., accuracy, precision/recall, F1, ROC-AUC, PR-AUC, log loss, MSE/MAE, calibration metrics).
  4. Propose a better evaluation workflow than a single holdout set (e.g., cross-validation, time-based split, stratification, repeated runs, confidence intervals).
  5. If you can add human labels (or human evaluation), explain:
    • what you would label,
    • how you would ensure quality (guidelines, inter-annotator agreement),
    • how it improves the evaluation signal.
  6. If you have no labels , what is the simplest way to estimate whether two model outputs/answers are similar ?

Part B — Transformers vs RNNs on long inputs

  1. Compare Transformer and RNN/LSTM/GRU architectures.
  2. For very long sequences , discuss the pros/cons of each (training stability, ability to capture long-range dependencies, compute/memory).
  3. Explain why attention can capture long-range dependencies, and why vanilla RNNs often struggle.

Part C — Detecting distribution mismatch in images

You have two sets of images (Set A and Set B). How would you test whether they appear to come from the same underlying distribution?

Part D — Optimizers

Compare the practical differences and tradeoffs among SGD (with/without momentum), RMSProp, Adam, and AdamW. When would AdamW be preferable?

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Machine Learning Engineer•Amazon Machine Learning Engineer•Amazon Machine Learning•Machine Learning Engineer Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.