This question evaluates a data scientist's competency in cost-sensitive model evaluation, handling extreme class imbalance, calibration and threshold derivation, experiment design, and post-launch monitoring and fairness within the Analytics & Experimentation domain.
You operate a binary classifier that flags e‑commerce orders for manual review. The base fraud rate is 0.7% (700 frauds out of 100,000 orders). Actions and outcome costs:
Two candidate models at threshold 0.5 produce the following on a 100,000‑order validation set (700 positives):
(a) For each model at threshold 0.5, compute:
(b) Derive the general cost‑optimal classification threshold in terms of calibrated P(y=1|x) and the four outcome costs. Then apply it to this problem (assume perfect calibration) and report the numeric threshold.
(c) Discuss:
(d) Propose:
Login required