PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Amazon

Explain ML evaluation, sequence models, and optimizers

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in machine learning model evaluation and metrics, sequence modeling trade-offs between transformers and RNNs, image distributional shift detection, and comparative understanding of optimization algorithms within the Machine Learning domain.

  • medium
  • Amazon
  • Machine Learning
  • Machine Learning Engineer

Explain ML evaluation, sequence models, and optimizers

Company: Amazon

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: medium

Interview Round: Onsite

## Scenario An interviewer is deep-diving into an ML project you built (you can assume it is a supervised model unless specified otherwise). They want you to justify model choices, evaluation, and training decisions. ## Part A — Evaluation design and metrics 1. Describe **how you evaluate** your model end-to-end (data split strategy, validation protocol, test usage). 2. Which **metrics** do you use and **why** (business/ML tradeoffs)? 3. Provide the **mathematical definitions** for the metrics you mention (e.g., accuracy, precision/recall, F1, ROC-AUC, PR-AUC, log loss, MSE/MAE, calibration metrics). 4. Propose a **better evaluation workflow** than a single holdout set (e.g., cross-validation, time-based split, stratification, repeated runs, confidence intervals). 5. If you can add **human labels** (or human evaluation), explain: - what you would label, - how you would ensure quality (guidelines, inter-annotator agreement), - how it improves the evaluation signal. 6. If you have **no labels**, what is the **simplest** way to estimate whether two model outputs/answers are **similar**? ## Part B — Transformers vs RNNs on long inputs 1. Compare **Transformer** and **RNN/LSTM/GRU** architectures. 2. For **very long sequences**, discuss the pros/cons of each (training stability, ability to capture long-range dependencies, compute/memory). 3. Explain why **attention** can capture long-range dependencies, and why vanilla RNNs often struggle. ## Part C — Detecting distribution mismatch in images You have two sets of images (Set A and Set B). How would you test whether they appear to come from the **same underlying distribution**? ## Part D — Optimizers Compare the practical differences and tradeoffs among **SGD (with/without momentum)**, **RMSProp**, **Adam**, and **AdamW**. When would AdamW be preferable?

Quick Answer: This question evaluates a candidate's competency in machine learning model evaluation and metrics, sequence modeling trade-offs between transformers and RNNs, image distributional shift detection, and comparative understanding of optimization algorithms within the Machine Learning domain.

Related Interview Questions

  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
  • Explain NLP/RL concepts used in LLM agents - Amazon (hard)
  • Design and evaluate a RAG system - Amazon (easy)
Amazon logo
Amazon
Dec 15, 2025, 12:00 AM
Machine Learning Engineer
Onsite
Machine Learning
3
0

Scenario

An interviewer is deep-diving into an ML project you built (you can assume it is a supervised model unless specified otherwise). They want you to justify model choices, evaluation, and training decisions.

Part A — Evaluation design and metrics

  1. Describe how you evaluate your model end-to-end (data split strategy, validation protocol, test usage).
  2. Which metrics do you use and why (business/ML tradeoffs)?
  3. Provide the mathematical definitions for the metrics you mention (e.g., accuracy, precision/recall, F1, ROC-AUC, PR-AUC, log loss, MSE/MAE, calibration metrics).
  4. Propose a better evaluation workflow than a single holdout set (e.g., cross-validation, time-based split, stratification, repeated runs, confidence intervals).
  5. If you can add human labels (or human evaluation), explain:
    • what you would label,
    • how you would ensure quality (guidelines, inter-annotator agreement),
    • how it improves the evaluation signal.
  6. If you have no labels , what is the simplest way to estimate whether two model outputs/answers are similar ?

Part B — Transformers vs RNNs on long inputs

  1. Compare Transformer and RNN/LSTM/GRU architectures.
  2. For very long sequences , discuss the pros/cons of each (training stability, ability to capture long-range dependencies, compute/memory).
  3. Explain why attention can capture long-range dependencies, and why vanilla RNNs often struggle.

Part C — Detecting distribution mismatch in images

You have two sets of images (Set A and Set B). How would you test whether they appear to come from the same underlying distribution?

Part D — Optimizers

Compare the practical differences and tradeoffs among SGD (with/without momentum), RMSProp, Adam, and AdamW. When would AdamW be preferable?

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Machine Learning Engineer•Amazon Machine Learning Engineer•Amazon Machine Learning•Machine Learning Engineer Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.