PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Amazon

Explain imbalance, metrics, bias-variance, Transformers vs. CNNs

Last updated: Jun 15, 2026

Quick Overview

An Amazon machine-learning-engineer technical screen covering class-imbalance handling at the data, loss, and decision levels; leakage-aware validation; the bias–variance tradeoff; metric selection for rare-positive fraud data (PR-AUC, precision@k, calibration); and Transformer vs. CNN inductive biases. Includes worked numeric examples for cost-based thresholds and metric interpretation.

  • hard
  • Amazon
  • Machine Learning
  • Machine Learning Engineer

Explain imbalance, metrics, bias-variance, Transformers vs. CNNs

Company: Amazon

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

##### Question You are given a highly imbalanced binary classification problem in a fraud-detection setting (roughly 1% positives). Walk through the core ML concepts an interviewer would probe in a technical screen: 1. **Diagnose class imbalance.** How do you detect and characterize it? Cover class ratios, stratified splits, why majority-class accuracy is misleading, per-class recall during training, and checking for imbalance/drift across time, geography, or user segments. 2. **Handle class imbalance** at three levels: data level (random over/under-sampling, SMOTE/ADASYN/Borderline-SMOTE, targeted collection, augmentation), algorithm/loss level (class weights, focal loss, anomaly-detection baselines), and decision level (threshold tuning, cost-sensitive thresholds, top-k under capacity, calibration). 3. **Design validation that avoids leakage and reflects real class priors.** Discuss resampling only on training folds, time-aware (train-past/validate-future) splits, group/entity-aware splits, feature-leakage checks, and drift/calibration monitoring. 4. **Bias–variance tradeoff.** Define it, explain how to diagnose high bias vs. high variance (learning curves, train–val gap), and give concrete mitigations for each (model capacity, regularization, features, data, ensembling). 5. **Choose and justify evaluation metrics for extreme imbalance.** Contrast accuracy, ROC-AUC, PR-AUC, F1/Fβ, precision@k / recall@k, calibration (Brier/ECE), and expected business cost. State when each is preferable. 6. **Compare Transformers and CNNs.** Their inductive biases, typical inputs, computational tradeoffs (O(n·k) conv vs. O(n²) attention), and when you would choose one over the other for text, images, sequences, or tabular fraud features.

Quick Answer: An Amazon machine-learning-engineer technical screen covering class-imbalance handling at the data, loss, and decision levels; leakage-aware validation; the bias–variance tradeoff; metric selection for rare-positive fraud data (PR-AUC, precision@k, calibration); and Transformer vs. CNN inductive biases. Includes worked numeric examples for cost-based thresholds and metric interpretation.

Related Interview Questions

  • Predicting the Next Elevator Call Location - Amazon (medium)
  • Explain Transformer and MoE Fundamentals - Amazon (medium)
  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
Amazon logo
Amazon
Sep 6, 2025, 12:00 AM
Machine Learning Engineer
Technical Screen
Machine Learning
3
0
Question

You are given a highly imbalanced binary classification problem in a fraud-detection setting (roughly 1% positives). Walk through the core ML concepts an interviewer would probe in a technical screen:

  1. Diagnose class imbalance. How do you detect and characterize it? Cover class ratios, stratified splits, why majority-class accuracy is misleading, per-class recall during training, and checking for imbalance/drift across time, geography, or user segments.
  2. Handle class imbalance at three levels: data level (random over/under-sampling, SMOTE/ADASYN/Borderline-SMOTE, targeted collection, augmentation), algorithm/loss level (class weights, focal loss, anomaly-detection baselines), and decision level (threshold tuning, cost-sensitive thresholds, top-k under capacity, calibration).
  3. Design validation that avoids leakage and reflects real class priors. Discuss resampling only on training folds, time-aware (train-past/validate-future) splits, group/entity-aware splits, feature-leakage checks, and drift/calibration monitoring.
  4. Bias–variance tradeoff. Define it, explain how to diagnose high bias vs. high variance (learning curves, train–val gap), and give concrete mitigations for each (model capacity, regularization, features, data, ensembling).
  5. Choose and justify evaluation metrics for extreme imbalance. Contrast accuracy, ROC-AUC, PR-AUC, F1/Fβ, precision@k / recall@k, calibration (Brier/ECE), and expected business cost. State when each is preferable.
  6. Compare Transformers and CNNs. Their inductive biases, typical inputs, computational tradeoffs (O(n·k) conv vs. O(n²) attention), and when you would choose one over the other for text, images, sequences, or tabular fraud features.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Machine Learning Engineer•Amazon Machine Learning Engineer•Amazon Machine Learning•Machine Learning Engineer Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.