Optimize threshold using confusion matrix and costs
Company: TikTok
Role: Data Scientist
Category: Statistics & Math
Difficulty: medium
Interview Round: Technical Screen
A calibrated classifier predicts a 1% positive class. For 10,000 held-out examples you observe: at threshold 0.50 → TP=60, FP=40, FN=40, TN=9,860; at threshold 0.20 → TP=85, FP=300, FN=15, TN=9,600. (1) Compute Precision, Recall, and F1 at both thresholds. (2) With a cost matrix FP=1 and FN=20 (TP,TN have zero cost), compute expected cost at the two thresholds and choose the cheaper threshold; show your math. (3) Explain why ROC-AUC can be misleading here and why PR-AUC is more appropriate; give a brief numeric intuition using the counts above. (4) If you were to set the threshold using cost-sensitive decision theory on a perfectly calibrated model, derive the optimal probability threshold t* in terms of FP and FN costs and the class prior.
Quick Answer: This question evaluates understanding of classification metrics, calibration, threshold selection, and cost-sensitive decision theory in imbalanced binary classification, involving precision/recall/F1 computation, expected-cost comparison from confusion matrices, and derivation of a cost-optimal probability threshold.