Explain ML basics: imbalance, metrics, bias-variance
Company: Amazon
Role: Machine Learning Engineer
Category: Machine Learning
Difficulty: hard
Interview Round: Technical Screen
Discuss how to handle class imbalance in a supervised classification task (e.g., fraud detection): how you diagnose it; techniques such as re-sampling (over/under/SMOTE), class-weighted or focal losses, data augmentation, threshold tuning, and calibration; and how you would design validation to avoid leakage and reflect real class priors. Define the bias–variance trade-off and describe concrete steps you would take for high bias vs high variance (model capacity, regularization, features, data). Choose appropriate evaluation metrics for highly imbalanced data and justify them—contrast accuracy, ROC-AUC, PR-AUC, F1/Fβ, recall@k/precision@k, and expected business cost; explain when each is preferable. Briefly explain the core ideas and inductive biases of CNNs versus Transformers and when you would prefer each for text, images, sequences, or tabular data.
Quick Answer: This question evaluates proficiency in handling class imbalance, selecting evaluation metrics, reasoning about bias–variance trade-offs, and comparing model inductive biases (CNNs vs. Transformers) within supervised Machine Learning for a Machine Learning Engineer role.