Deploying a High-Precision Classifier on an Imbalanced Dataset
You are given a binary classification problem with 50,000 samples and ~5% positives. The product requires test-set precision ≥ 0.95 for the positive class. Assume you can train any probabilistic classifier that outputs calibrated scores/probabilities for the positive class.
Tasks
-
Threshold selection on validation
-
Train any probabilistic model on a training set.
-
On a held-out validation set, compute the precision–recall (PR) curve.
-
Among thresholds whose precision ≥ 0.95, select the threshold τ that maximizes recall; if multiple thresholds yield the same recall, pick the smallest threshold.
-
Final evaluation on test
-
Fix τ from validation.
-
Evaluate once on the untouched test set and report:
-
Precision and recall (positive class)
-
Number of predicted positives
-
Expected number of false positives if you flag 1,000 items
-
If no τ on validation reaches precision ≥ 0.95
-
Propose two actionable strategies (e.g., abstain/top-k policy, recalibration, new model/features) and explain risks.
-
Provide sklearn-style code
-
Show how to search τ using precision_recall_curve (or using make_scorer with a custom threshold) without leaking test data.
-
Explain considerations
-
Explain how class imbalance and calibration affect your ability to meet the precision constraint, and why optimizing ROC AUC would be misleading here.
Assumptions: Use a proper train/validation/test split with stratification; select τ only on validation; test set remains untouched until final evaluation.