Offline Evaluation Framework for a Harmful-Content Video Classifier
Context
You are evaluating a binary classifier that assigns each video a score (interpretable as the probability it violates a harmful-content policy). Harmful videos are rare (label skew). The business incurs different costs for false positives (over-blocking/over-review) and false negatives (missed harm). Moderation capacity may also be limited.
Task
Design an offline evaluation plan that covers:
-
Metrics
-
Which ranking and operating-point metrics to report (e.g., precision, recall, PR-AUC), including calibration metrics.
-
Class Imbalance
-
How to evaluate meaningfully under severe label skew and when the evaluation sample is not a simple random draw (e.g., stratified by model score).
-
Ground-Truth Collection
-
How to collect high-quality labels for harmful content, including rater setup, agreement, sampling, and quality controls.
-
Threshold Selection and Business Costs
-
How to choose a decision threshold given asymmetric costs of false positives vs. false negatives and potential moderation capacity constraints.
-
Fairness Checks
-
What subgroup analyses and fairness metrics to run to guard against disparate impact.
Include assumptions where necessary, and provide formulas and examples for thresholding and weighting.