Achieve 0.95 precision via thresholding

Q: Achieve 0.95 precision via thresholding

This is a Machine Learning interview question from Boston Consulting Group for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Deploying a High-Precision Classifier on an Imbalanced Dataset

You are given a binary classification problem with 50,000 samples and ~5% positives. The product requires test-set precision ≥ 0.95 for the positive class. Assume you can train any probabilistic classifier that outputs calibrated scores/probabilities for the positive class.

Tasks

Threshold selection on validation

Train any probabilistic model on a training set.
On a held-out validation set, compute the precision–recall (PR) curve.
Among thresholds whose precision ≥ 0.95, select the threshold τ that maximizes recall; if multiple thresholds yield the same recall, pick the smallest threshold.

Final evaluation on test

Fix τ from validation.
Evaluate once on the untouched test set and report:
- Precision and recall (positive class)
- Number of predicted positives
- Expected number of false positives if you flag 1,000 items

If no τ on validation reaches precision ≥ 0.95

Propose two actionable strategies (e.g., abstain/top-k policy, recalibration, new model/features) and explain risks.

Provide sklearn-style code

Show how to search τ using precision_recall_curve (or using make_scorer with a custom threshold) without leaking test data.

Explain considerations

Explain how class imbalance and calibration affect your ability to meet the precision constraint, and why optimizing ROC AUC would be misleading here.

Assumptions: Use a proper train/validation/test split with stratification; select τ only on validation; test set remains untouched until final evaluation.

Achieve 0.95 precision via thresholding

Deploying a High-Precision Classifier on an Imbalanced Dataset

Tasks

Solution

Comments (0)