This question evaluates competency in binary classifier evaluation, cost-sensitive decision-making, probability calibration, and safe online deployment within the Machine Learning domain for a Data Scientist role, requiring practical application informed by conceptual understanding.

You inherit a binary fraud classifier used to decide whether to block an event. At one operating threshold, the holdout confusion matrix is:
Answer the following:
(a) Compute precision, recall, F1, and false-positive rate (FPR) at this threshold.
(b) Assuming a false negative costs 0.20 in an environment with 1% fraud prevalence, describe how to choose a threshold using predicted probabilities to maximize expected utility. State which ranking metric (PR-AUC vs. ROC-AUC) better reflects improvements at low prevalence and why.
(c) Outline how you would check and fix probability calibration (e.g., reliability plot, isotonic or Platt scaling) and explain how calibration interacts with threshold selection.
(d) Propose an online evaluation plan with guardrails that avoids over-blocking legitimate users while improving catch rate. Include ideas such as shadow evaluation and a two-stage review for low-confidence positives, and define concrete success criteria and rollback triggers.
Login required