This question evaluates a candidate's competence in machine learning model evaluation and online experiment design for content moderation, testing skills such as handling class imbalance, probabilistic scoring and calibration, threshold selection, slice-based robustness checks, and production experiment planning.

You are given a binary classification model that detects harmful content in a social platform and flags items for either removal or down‑ranking. You need to:
Assume class imbalance (harmful content is rare), probabilistic model outputs (scores), and that some actions (auto‑remove) can prevent us from observing true labels unless we design around it.
Login required