Define engagement metrics and analyze comment distribution
Company: Meta
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: easy
Interview Round: Onsite
You are a Data Scientist for a **video platform**. A PM asks you to:
1) **Define metrics for “engagement”** (they want a clear metric framework they can use in experiments).
2) Analyze **user comment distribution** and propose how you would monitor it over time.
## Part A — Engagement metric framework
Propose:
- **Primary metric(s)** (what you would optimize)
- **Diagnostic metrics** (to explain movement)
- **Guardrail metrics** (to prevent harmful changes)
Be explicit about:
- Unit of analysis (user-day, session, video-view, etc.)
- How you’d handle heavy users / skew (mean vs median, winsorization, log transforms)
- How you’d prevent gaming (spammy/low-quality engagement)
## Part B — Comment distribution
Comments are known to be **heavy-tailed** (most users comment rarely; a small minority comment a lot).
Describe:
- What distributions you would compute (by user, by video, by cohort)
- What slices you would look at (new vs returning users, content categories, geos)
- How you would detect regressions or anomalies (e.g., bots, spam, ranking changes)
- What experiment you would run if the goal is to increase “healthy” commenting, including key confounders and how you’d interpret results
Quick Answer: This question evaluates a data scientist's competency in metric framework design, distributional analysis of heavy-tailed user behavior, monitoring and anomaly detection, and experimental setup for measuring healthy engagement, including considerations like unit of analysis, treatment of heavy users, and anti-spam guardrails.