This question evaluates proficiency in designing an offline evaluation framework for binary machine-learning classifiers, covering selection of ranking and operating-point metrics, calibration and class-imbalance handling, ground-truth labeling protocols, thresholding under asymmetric costs and capacity constraints, and subgroup fairness analyses; it is in the Machine Learning domain and emphasizes practical application of evaluation and ML systems design. It is commonly asked in technical interviews because it probes both conceptual understanding of statistical and fairness trade-offs and the ability to translate business constraints into measurable evaluation criteria, testing applied reasoning rather than purely theoretical knowledge.

You are evaluating a binary classifier that assigns each video a score (interpretable as the probability it violates a harmful-content policy). Harmful videos are rare (label skew). The business incurs different costs for false positives (over-blocking/over-review) and false negatives (missed harm). Moderation capacity may also be limited.
Design an offline evaluation plan that covers:
Include assumptions where necessary, and provide formulas and examples for thresholding and weighting.
Login required