Fraud Detection (≈1% Positives): Imbalance Strategies, Bias–Variance, Metrics, and Model Choices
Context: You are building a binary classifier to flag fraudulent transactions, where only about 1% of transactions are fraud (positives). Answer the following:
-
Strategies for class imbalance
-
Data level (e.g., resampling)
-
Algorithm level (e.g., class weights, focal loss)
-
Decision level (e.g., threshold tuning, cost-sensitive decisions)
-
Bias–variance tradeoff
-
Define the tradeoff
-
How to diagnose high bias vs. high variance in this scenario
-
How to mitigate each
-
Evaluation metrics
-
Select and justify metrics suitable for extreme imbalance: PR AUC vs. ROC AUC, precision@k, recall@k, F1/Fβ, calibration/Brier score
-
Discuss when each is preferable
-
Transformers vs. CNNs
-
Compare inductive biases, typical inputs, computational tradeoffs
-
When to choose each for text, sequences, images, or tabular fraud features