Calibrated Classifier on an Imbalanced Dataset (1% positives)
You have a perfectly calibrated binary classifier evaluated on 10,000 held-out examples. The true positive rate (prevalence) is 1% (i.e., about 100 positives).
You observe the following confusion matrices at two probability thresholds:
-
Threshold = 0.50 → TP = 60, FP = 40, FN = 40, TN = 9,860
-
Threshold = 0.20 → TP = 85, FP = 300, FN = 15, TN = 9,600
Tasks:
-
Compute Precision, Recall, and F1-score at both thresholds.
-
With a cost matrix where FP costs 1 and FN costs 20 (TP and TN cost 0), compute the expected total cost at both thresholds and choose the cheaper threshold. Show your math.
-
Explain why ROC-AUC can be misleading in this setting and why PR-AUC is more appropriate. Provide a brief numeric intuition using the counts above.
-
For a perfectly calibrated model, derive the optimal probability threshold t* using cost-sensitive decision theory in terms of FP and FN costs and the class prior. State any simplifying assumptions you make.