Context
You are a Data Scientist working on integrity/harmful-content for a social media product. The company wants a single “severity” metric (or metric suite) to track how bad policy-violating content is on the platform over time and to evaluate integrity interventions (ranking changes, enforcement, classifiers, human review).
Problem
-
Propose metrics to measure the severity of violating/harmful content
on the platform.
-
Define what each metric means.
-
Specify the unit of analysis (content item, user, impression/view, session, day).
-
Clarify what counts as “violation” (e.g., policy-violating content confirmed by human review or high-confidence classifier).
-
The team suggests using
View Prevalence
as the main KPI:
-
Example definition:
View Prevalence (VP)
= (views/impressions of violating content) / (all views/impressions).
-
Discuss pros and cons of View Prevalence
as a primary severity metric.
-
Discuss key tradeoffs
when choosing/optimizing these metrics.
-
Include at least: user safety vs engagement, precision vs recall, reporting robustness vs sensitivity to change, and fairness/coverage across regions/languages.
-
If you were asked to recommend a final metric suite,
which metric would you pick as the primary KPI
, and what would be your
diagnostic and guardrail metrics
?
Assume you have:
-
Impression/view logs, content metadata, policy labels from human review (partial coverage), and ML classifier scores.
-
Interventions can change both the
amount
of violating content and the
distribution of views
across content.