PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Machine Learning/Amazon

Diagnose and fix underperforming ML model

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in diagnosing and remediating underperforming binary classifiers under severe class imbalance, covering validation diagnostics, calibration, threshold selection under operational review constraints, cost-sensitive utility reasoning, and basic deployment monitoring for drift.

  • hard
  • Amazon
  • Machine Learning
  • Data Scientist

Diagnose and fix underperforming ML model

Company: Amazon

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

You inherited a binary fraud model with extreme class imbalance (positives ≈2%). Current performance on a temporally separated validation set: AUC=0.61, precision@recall=0.90 is only 0.05. You have one day to meaningfully improve recall at fixed review capacity. 1) Describe how you would quickly diagnose underfitting vs. overfitting (learning curves, calibration plots, PR vs. ROC trade-offs, leakage checks). 2) Propose three targeted interventions that can be implemented in a day (e.g., class-weighted loss, monotonic gradient boosting with categorical encoders, threshold moving with cost-sensitive utility) and justify why each should help. 3) Show how you would choose a decision threshold that maximizes expected utility given: FP cost=$2, FN cost=$50, review capacity=0.5% of traffic; write the utility formula and outline the validation-time procedure. 4) List the minimal logging/monitoring you’d add at deployment to detect drift and data quality issues within a week.

Quick Answer: This question evaluates a data scientist's competency in diagnosing and remediating underperforming binary classifiers under severe class imbalance, covering validation diagnostics, calibration, threshold selection under operational review constraints, cost-sensitive utility reasoning, and basic deployment monitoring for drift.

Related Interview Questions

  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
  • Explain NLP/RL concepts used in LLM agents - Amazon (hard)
  • Design and evaluate a RAG system - Amazon (easy)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Machine Learning
5
0

Rapidly Improving Recall Under Class Imbalance (One-Day Plan)

Context

You inherit a binary fraud detection model with severe class imbalance (positive rate ≈ 2%). Evaluation on a temporally separated validation set shows:

  • ROC AUC = 0.61
  • Precision at 90% recall = 0.05 (very low precision at high recall, consistent with extreme imbalance)
  • Operations constraint: only 0.5% of traffic can be reviewed (fixed review capacity)

Goal: In one day, meaningfully improve recall at the same review capacity.

Tasks

  1. Diagnosis: Describe how you would quickly distinguish underfitting versus overfitting using learning curves, calibration plots, PR vs ROC analysis at fixed capacity, and leakage/drift checks.
  2. Interventions: Propose three changes you can implement in a day (e.g., class-weighted loss, monotonic gradient boosting with categorical encoders, threshold moving using cost-sensitive utility), and justify why each helps.
  3. Thresholding for Utility: Show how to choose a decision threshold that maximizes expected utility given:
  • False Positive (FP) cost = $2
  • False Negative (FN) cost = $50
  • Review capacity = 0.5% of traffic Provide the utility (or cost) formula and outline the selection procedure on validation data.
  1. Monitoring: List the minimal logging/monitoring to add at deployment to detect drift and data quality issues within a week.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.