Explain ML evaluation, sequence models, and optimizers
Company: Amazon
Role: Machine Learning Engineer
Category: Machine Learning
Difficulty: medium
Interview Round: Onsite
## Scenario
An interviewer is deep-diving into an ML project you built (you can assume it is a supervised model unless specified otherwise). They want you to justify model choices, evaluation, and training decisions.
## Part A — Evaluation design and metrics
1. Describe **how you evaluate** your model end-to-end (data split strategy, validation protocol, test usage).
2. Which **metrics** do you use and **why** (business/ML tradeoffs)?
3. Provide the **mathematical definitions** for the metrics you mention (e.g., accuracy, precision/recall, F1, ROC-AUC, PR-AUC, log loss, MSE/MAE, calibration metrics).
4. Propose a **better evaluation workflow** than a single holdout set (e.g., cross-validation, time-based split, stratification, repeated runs, confidence intervals).
5. If you can add **human labels** (or human evaluation), explain:
- what you would label,
- how you would ensure quality (guidelines, inter-annotator agreement),
- how it improves the evaluation signal.
6. If you have **no labels**, what is the **simplest** way to estimate whether two model outputs/answers are **similar**?
## Part B — Transformers vs RNNs on long inputs
1. Compare **Transformer** and **RNN/LSTM/GRU** architectures.
2. For **very long sequences**, discuss the pros/cons of each (training stability, ability to capture long-range dependencies, compute/memory).
3. Explain why **attention** can capture long-range dependencies, and why vanilla RNNs often struggle.
## Part C — Detecting distribution mismatch in images
You have two sets of images (Set A and Set B). How would you test whether they appear to come from the **same underlying distribution**?
## Part D — Optimizers
Compare the practical differences and tradeoffs among **SGD (with/without momentum)**, **RMSProp**, **Adam**, and **AdamW**. When would AdamW be preferable?
Quick Answer: This question evaluates a candidate's competency in machine learning model evaluation and metrics, sequence modeling trade-offs between transformers and RNNs, image distributional shift detection, and comparative understanding of optimization algorithms within the Machine Learning domain.