Optimize threshold using confusion matrix and costs

Q: Optimize threshold using confusion matrix and costs

This question evaluates understanding of classification metrics, calibration, threshold selection, and cost-sensitive decision theory in imbalanced binary classification, involving precision/recall/F1 computation, expected-cost comparison from confusion matrices, and derivation of a cost-optimal probability threshold.

Q: How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

Question

Calibrated Classifier on an Imbalanced Dataset (1% positives)

You have a perfectly calibrated binary classifier evaluated on 10,000 held-out examples. The true positive rate (prevalence) is 1% (i.e., about 100 positives).

You observe the following confusion matrices at two probability thresholds:

Threshold = 0.50 → TP = 60, FP = 40, FN = 40, TN = 9,860
Threshold = 0.20 → TP = 85, FP = 300, FN = 15, TN = 9,600

Tasks:

Compute Precision, Recall, and F1-score at both thresholds.
With a cost matrix where FP costs 1 and FN costs 20 (TP and TN cost 0), compute the expected total cost at both thresholds and choose the cheaper threshold. Show your math.
Explain why ROC-AUC can be misleading in this setting and why PR-AUC is more appropriate. Provide a brief numeric intuition using the counts above.
For a perfectly calibrated model, derive the optimal probability threshold t* using cost-sensitive decision theory in terms of FP and FN costs and the class prior. State any simplifying assumptions you make.

Optimize threshold using confusion matrix and costs

Calibrated Classifier on an Imbalanced Dataset (1% positives)

Solution

Comments (0)

Optimize threshold using confusion matrix and costs

Overview

Calibrated Classifier on an Imbalanced Dataset (1% positives)

Solution

Comments (0)