Explain imbalance, metrics, bias-variance, Transformers vs. CNNs
Company: Amazon
Role: Machine Learning Engineer
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
##### Question
You are given a highly imbalanced binary classification problem in a fraud-detection setting (roughly 1% positives). Walk through the core ML concepts an interviewer would probe in a technical screen:
1. **Diagnose class imbalance.** How do you detect and characterize it? Cover class ratios, stratified splits, why majority-class accuracy is misleading, per-class recall during training, and checking for imbalance/drift across time, geography, or user segments.
2. **Handle class imbalance** at three levels: data level (random over/under-sampling, SMOTE/ADASYN/Borderline-SMOTE, targeted collection, augmentation), algorithm/loss level (class weights, focal loss, anomaly-detection baselines), and decision level (threshold tuning, cost-sensitive thresholds, top-k under capacity, calibration).
3. **Design validation that avoids leakage and reflects real class priors.** Discuss resampling only on training folds, time-aware (train-past/validate-future) splits, group/entity-aware splits, feature-leakage checks, and drift/calibration monitoring.
4. **Bias–variance tradeoff.** Define it, explain how to diagnose high bias vs. high variance (learning curves, train–val gap), and give concrete mitigations for each (model capacity, regularization, features, data, ensembling).
5. **Choose and justify evaluation metrics for extreme imbalance.** Contrast accuracy, ROC-AUC, PR-AUC, F1/Fβ, precision@k / recall@k, calibration (Brier/ECE), and expected business cost. State when each is preferable.
6. **Compare Transformers and CNNs.** Their inductive biases, typical inputs, computational tradeoffs (O(n·k) conv vs. O(n²) attention), and when you would choose one over the other for text, images, sequences, or tabular fraud features.
Quick Answer: An Amazon machine-learning-engineer technical screen covering class-imbalance handling at the data, loss, and decision levels; leakage-aware validation; the bias–variance tradeoff; metric selection for rare-positive fraud data (PR-AUC, precision@k, calibration); and Transformer vs. CNN inductive biases. Includes worked numeric examples for cost-based thresholds and metric interpretation.