This question evaluates a data scientist's competency in machine-learning model evaluation and experimentation, covering metrics (precision, recall, F1/Fβ, ROC-AUC, PR-AUC, calibration), dataset and label quality, class imbalance handling, thresholding/triage policies, and safety and ethical guardrails.

You are evaluating a new machine-learning model that detects harmful content on a large consumer platform. Leadership needs evidence that the new model outperforms the existing model, and to understand trade-offs between catching more harmful posts and avoiding over-removal of benign posts.
Design a comprehensive evaluation plan to compare the new model against:
Address both:
Make the precision–recall trade-offs explicit, and connect model metrics to business outcomes.
Login required