Evaluate Classifier with Precision, Recall, and Fairness Metrics

Q: Evaluate Classifier with Precision, Recall, and Fairness Metrics

This question evaluates proficiency in designing an offline evaluation framework for binary machine-learning classifiers, covering selection of ranking and operating-point metrics, calibration and class-imbalance handling, ground-truth labeling protocols, thresholding under asymmetric costs and capacity constraints, and subgroup fairness analyses; it is in the Machine Learning domain and emphasizes practical application of evaluation and ML systems design. It is commonly asked in technical interviews because it probes both conceptual understanding of statistical and fairness trade-offs and the ability to translate business constraints into measurable evaluation criteria, testing applied reasoning rather than purely theoretical knowledge.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Offline Evaluation Framework for a Harmful-Content Video Classifier

Context

You are evaluating a binary classifier that assigns each video a score (interpretable as the probability it violates a harmful-content policy). Harmful videos are rare (label skew). The business incurs different costs for false positives (over-blocking/over-review) and false negatives (missed harm). Moderation capacity may also be limited.

Task

Design an offline evaluation plan that covers:

Metrics

Which ranking and operating-point metrics to report (e.g., precision, recall, PR-AUC), including calibration metrics.

Class Imbalance

How to evaluate meaningfully under severe label skew and when the evaluation sample is not a simple random draw (e.g., stratified by model score).

Ground-Truth Collection

How to collect high-quality labels for harmful content, including rater setup, agreement, sampling, and quality controls.

Threshold Selection and Business Costs

How to choose a decision threshold given asymmetric costs of false positives vs. false negatives and potential moderation capacity constraints.

Fairness Checks

What subgroup analyses and fairness metrics to run to guard against disparate impact.

Include assumptions where necessary, and provide formulas and examples for thresholding and weighting.

Evaluate Classifier with Precision, Recall, and Fairness Metrics

Offline Evaluation Framework for a Harmful-Content Video Classifier

Context

Task

Solution

Comments (0)