Evaluate a model and choose metrics

Q: Evaluate a model and choose metrics

This question evaluates a data scientist's competency in cost-sensitive model evaluation, handling extreme class imbalance, calibration and threshold derivation, experiment design, and post-launch monitoring and fairness within the Analytics & Experimentation domain.

Q: How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

Question

Fraud-screening model evaluation under class imbalance and asymmetric costs

Context

You operate a binary classifier that flags e‑commerce orders for manual review. The base fraud rate is 0.7% (700 frauds out of 100,000 orders). Actions and outcome costs:

If flagged: manual review cost = $3 for any flagged order.
Additional friction cost for mistakenly flagging a legitimate order (FP) = $1.
Missing a fraud (FN) costs $120.
Correctly passing a legitimate order (TN) costs $0.

Two candidate models at threshold 0.5 produce the following on a 100,000‑order validation set (700 positives):

Model A: TP=490, FP=4,900, FN=210, TN=94,400.
Model B: TP=560, FP=8,400, FN=140, TN=90,900.

Tasks

(a) For each model at threshold 0.5, compute:

Precision, Recall, F1
TPR and FPR (and a single‑point ROC‑AUC proxy)
Expected cost per order under the stated costs Decide which model is better under these costs.

(b) Derive the general cost‑optimal classification threshold in terms of calibrated P(y=1|x) and the four outcome costs. Then apply it to this problem (assume perfect calibration) and report the numeric threshold.

(c) Discuss:

PR‑AUC vs ROC‑AUC under extreme class imbalance
Calibration checks (e.g., Brier score, Expected Calibration Error)
Decision curve analysis / net benefit and how it aligns with the cost structure

(d) Propose:

An offline evaluation plan robust to prevalence shifts
A safe online A/B plan with guardrails (manual review SLAs, false accusation rate, holdout for drift)
A post‑launch monitoring plan for concept drift and fairness across user segments

Evaluate a model and choose metrics

Fraud-screening model evaluation under class imbalance and asymmetric costs

Context

Tasks

Solution

Comments (0)

Evaluate a model and choose metrics

Overview

Fraud-screening model evaluation under class imbalance and asymmetric costs

Context

Tasks

Solution

Comments (0)