Design metrics for violating content exposure
Company: Meta
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Technical Screen
You’re working on a UGC platform with automated and human moderation. Design a metric framework to measure user exposure to violating content. 1) Define precise formulas for at least three daily metrics and 7-day rolling counterparts: view_prevalence = violating_views / total_views, violating_session_rate = sessions_with_≥1_violating_view / total_sessions, and violations_per_active_user = violating_views / DAU. 2) Specify exact inclusion/exclusion rules for what counts as a violating view under two timing regimes: ex-ante (only violations known at view time) vs. ex-post (final decision after reviews), and how to treat late-arriving labels, appeals, deleted content, repeat views, and bot traffic. 3) Is view_prevalence a good north-star? Compare it vs. incident_rate (violating_items / items_created) and user_prevalence (users_exposed / DAU); discuss tradeoffs like detection lag, denominator gaming, precision/recall shifts, Simpson’s paradox across countries/surfaces, and Goodhart’s law. 4) Propose a weekly alerting method with thresholds using uncertainty estimates (e.g., Wilson or Bayesian beta-binomial intervals) and describe guardrail metrics (false positive exposure, creator churn, review queue SLA). 5) Sketch an A/B test to reduce view_prevalence: state the primary metric, key segments (country, surface, creator cohort), power assumptions, and how you’ll correct for label latency and selection bias when violations are discovered after exposure.
Quick Answer: This question evaluates competency in metric design and measurement for content safety, including defining exposure metrics, inclusion/exclusion rules, handling label latency and appeals, uncertainty-aware alerting, and A/B test design for a Data Scientist role in the Analytics & Experimentation domain.