Evaluating a Harmful-Content Detection Model: Offline and Online
Context
You are given a binary classification model that detects harmful content in a social platform and flags items for either removal or down‑ranking. You need to:
-
Evaluate the model offline on a labeled validation set.
-
Design an online experiment to test the model in production.
Assume class imbalance (harmful content is rare), probabilistic model outputs (scores), and that some actions (auto‑remove) can prevent us from observing true labels unless we design around it.
Tasks
-
Offline evaluation (labeled validation set):
-
Define and compute core metrics (precision, recall, FPR, ROC/PR curves, AUCs).
-
Assess calibration and choose an operating threshold given policy and cost trade‑offs.
-
Check robustness across slices (e.g., language/region) and over time.
-
Online experiment design:
-
State hypotheses.
-
Define variants (control vs. treatment), including any shadow/canary ramps.
-
Specify randomization unit, traffic split, duration, and significance plan.
-
Define primary success metrics and guardrails (safety, engagement, fairness, latency).
-
Address measurement challenges (delayed/hidden labels due to enforcement).