Fraud Triage Thresholding with Calibrated Scores
Context
You have a fraud model that outputs a calibrated score s ∈ [0, 1] per account, where s ≈ P(fake | features). Each day you must triage 2,000,000 accounts into one of three actions:
-
Auto-block if s ≥ t.
-
Send to manual review if r ≤ s < t.
-
Allow otherwise (s < r).
Constraints and costs:
-
Daily volume: 2,000,000 accounts.
-
Base fake rate: ~1% (for reference; calibration handles this implicitly).
-
Manual review budget: ≤ 100,000 accounts/day.
-
Manual review detects 95% of fakes it sees (and misses 5%).
-
Cost of auto-blocking a real user: $5.
-
Cost of letting a fake pass: $20.
-
Cost of manual review: $1 per account.
Assumption: Manual review does not incorrectly block real users (false blocks via review are negligible compared to auto-block false positives). If this is not true in your system, add that cost explicitly.
Tasks
(a) Formulate the expected daily cost as a function of thresholds r and t given calibrated scores. Describe how to estimate it from historical labeled data using isotonic or Platt calibration and empirical score distributions.
(b) Optimize r and t under the review budget. Explain how you would choose the operating point on the precision–recall (PR) curve and verify the choice with an online interleaved test.
(c) Describe a drift monitoring plan and a weekly threshold re-tuning process with backtesting and safety rails.