How to measure harmful-content severity and run experiments
Company: Meta
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: easy
Interview Round: Technical Screen
##### Question
You are a Data Scientist working on **content integrity / harmful content** at a large social media platform (e.g., hate/harassment, self-harm, graphic violence, sexual exploitation, spam, misinformation). Not all harmful content is equally severe. The team wants to reduce the harm users experience and is proposing an intervention — for example a new ranking demotion, a removal/enforcement policy change, or a new ML classifier + enforcement workflow.
Design a measurement and experimentation framework for this problem. Address the following:
1. **Define "severity" of harmful content.**
- Propose a severity framework that supports both measurement and decision-making.
- What signals would you use (policy labels, human review, user reports, downstream user harm, virality, repeat exposure, content type, viewer vulnerability)?
- Would you represent severity as a **binary label, ordinal levels, or a continuous score** — and why? Discuss the pros/cons of each.
- Explain tradeoffs (interpretability vs. sensitivity, policy alignment, subjectivity, multilingual / cross-region considerations).
2. **Design metrics (primary / diagnostic / guardrails).**
- Give a clearly defined **primary** metric capturing harmful-content impact, several **diagnostic** metrics that explain movement, and several **guardrail** metrics that detect unintended harm.
- Distinguish between **prevalence** (creation-side), **exposure** (distribution-side), **severity-weighted exposure**, **enforcement accuracy**, and **user-experience side effects**. State the pros/cons of each.
- Specify exact definitions (numerators/denominators) and any weighting (e.g., by exposure or severity).
- What **denominator** is appropriate — content created, content viewed/impressions, active users, or sessions — and how does it depend on the intervention?
3. **Design an experiment to evaluate the intervention.**
- Choose an appropriate **randomization unit** (viewer/user, viewer-session, content item, author/creator, community/network cluster, or geo) and justify it. Discuss the tradeoffs of each.
- Specify the **primary success metric, guardrail metrics, and long-term metrics**.
- Discuss pitfalls: **interference / spillover** (content spreads across users and social graphs), network effects, contamination, novelty effects, delayed outcomes, and measurement error. How would you handle interference?
- Explain the analysis plan (intent-to-treat vs. per-protocol, variance reduction, segmentation, multiple testing).
4. **Biases, pitfalls, and edge cases.**
- Identify sources of selection bias (reporting bias / brigading), labeling bias and reviewer drift, delayed feedback, and **Simpson's paradox** / subgroup regressions.
- How would you handle **rare-but-severe** harms versus **common low-severity** harms (the base-rate problem)?
- How would you prevent the team from "improving" the chosen metric while making the platform worse overall (metric gaming)?
- How would you make the final **launch recommendation**?
### Assumptions
- You can log impressions/views, engagement, reports, enforcement actions, and model outputs.
- "Harmful content" is determined by a combination of policy rules, human review, and ML signals (imperfect).
Quick Answer: A Meta Data Scientist analytics & experimentation question: design a measurement and experimentation framework for harmful content where severity varies. Covers defining severity (binary vs. ordinal vs. continuous), a metric stack (prevalence, exposure, severity-weighted exposure, enforcement accuracy, guardrails) with the right denominator, A/B test design and randomization-unit choice, interference/spillover, and biases including reporting bias, Simpson's paradox, the rare-severe base-rate problem, and metric gaming.