Achieve 0.95 precision via thresholding
Company: Boston Consulting Group
Role: Data Scientist
Category: Machine Learning
Difficulty: medium
Interview Round: Take-home Project
You must deploy a classifier on an imbalanced dataset (50,000 samples; ~5% positives). Product requires test-set precision ≥ 0.95 for the positive class. 1) Train any probabilistic model; on a held-out validation set, compute the precision–recall curve and select the smallest probability threshold τ achieving precision ≥ 0.95 while maximizing recall (break ties by higher recall, then by lower threshold). 2) Fix τ and evaluate once on the untouched test set; report precision, recall, number of predicted positives, and the expected number of false positives if you flag 1,000 items. 3) If no τ on validation attains 0.95 precision, propose two actionable strategies (e.g., abstain/top-k policy, recalibration, different model or features) and explain risks. 4) Provide sklearn-style code to search τ (using precision_recall_curve or make_scorer with a custom threshold) without leaking test data. 5) Explain how class imbalance and calibration affect your ability to meet the precision constraint and why optimizing ROC AUC would be misleading here.
Quick Answer: This question evaluates probabilistic classifier calibration, threshold selection to achieve a target precision, handling severe class imbalance, and precision–recall based evaluation metrics.