Context
You are designing and evaluating production machine learning models, with emphasis on classification, reliability, and efficient architectures. Answer the following multi-part question.
Part 1 — Diagnose and Mitigate Overfitting
Explain how to diagnose overfitting and discuss mitigation techniques, including:
-
Regularization: L1, L2 (weight decay)
-
Data augmentation (and label-preserving transformations)
-
Early stopping and learning-rate schedules
-
Dropout and related stochastic regularizers
-
Architecture changes (capacity, inductive bias, normalization)
-
Cross-validation
-
Proper validation strategies (avoiding leakage; stratified/group/time-based splits)
For each technique, explain trade-offs.
Part 2 — Handle Class Imbalance
Discuss approaches to class imbalance:
-
Undersampling vs. oversampling (random; SMOTE and variants)
-
Class weighting, focal loss, and threshold selection
-
When each method is appropriate and how to evaluate with precision/recall, PR-AUC, ROC-AUC, and calibration
Detail common undersampling strategies and their impact on bias/variance and minority-class recall:
-
Random undersampling
-
Tomek links
-
Edited Nearest Neighbors (ENN)
-
NearMiss variants
-
Cluster-centroid methods
Part 3 — Attention Heads in Transformers
Define an attention head and explain:
-
Queries, keys, values
-
How multi-head attention splits representation space
-
What multiple heads can capture
-
How head count affects capacity, compute, and inductive bias