Fraud-screening model evaluation under class imbalance and asymmetric costs
Context
You operate a binary classifier that flags e‑commerce orders for manual review. The base fraud rate is 0.7% (700 frauds out of 100,000 orders). Actions and outcome costs:
-
If flagged: manual review cost = $3 for any flagged order.
-
Additional friction cost for mistakenly flagging a legitimate order (FP) = $1.
-
Missing a fraud (FN) costs $120.
-
Correctly passing a legitimate order (TN) costs $0.
Two candidate models at threshold 0.5 produce the following on a 100,000‑order validation set (700 positives):
-
Model A: TP=490, FP=4,900, FN=210, TN=94,400.
-
Model B: TP=560, FP=8,400, FN=140, TN=90,900.
Tasks
(a) For each model at threshold 0.5, compute:
-
Precision, Recall, F1
-
TPR and FPR (and a single‑point ROC‑AUC proxy)
-
Expected cost per order under the stated costs
Decide which model is better under these costs.
(b) Derive the general cost‑optimal classification threshold in terms of calibrated P(y=1|x) and the four outcome costs. Then apply it to this problem (assume perfect calibration) and report the numeric threshold.
(c) Discuss:
-
PR‑AUC vs ROC‑AUC under extreme class imbalance
-
Calibration checks (e.g., Brier score, Expected Calibration Error)
-
Decision curve analysis / net benefit and how it aligns with the cost structure
(d) Propose:
-
An offline evaluation plan robust to prevalence shifts
-
A safe online A/B plan with guardrails (manual review SLAs, false accusation rate, holdout for drift)
-
A post‑launch monitoring plan for concept drift and fairness across user segments