Handling Class Imbalance, Bias–Variance, Metrics, and Model Choices
Context
You are building a supervised classifier for a highly imbalanced task (e.g., fraud detection) where the positive class is rare. Discuss how you would:
-
Diagnose class imbalance.
-
Address imbalance using data-level and algorithm-level techniques (e.g., over/under-sampling, SMOTE variants, class-weighted losses, focal loss), data augmentation, threshold tuning, and probability calibration.
-
Design validation to avoid leakage and to reflect real class priors.
-
Define the bias–variance trade-off and outline concrete steps for high bias vs. high variance (model capacity, regularization, features, data).
-
Choose appropriate evaluation metrics for imbalanced data and justify them—contrast accuracy, ROC-AUC, PR-AUC, F1/Fβ, recall@k/precision@k, and expected business cost; explain when each is preferable.
-
Briefly explain the core ideas and inductive biases of CNNs vs. Transformers and when you would prefer each for text, images, sequences, or tabular data.