You are a Data Scientist for a video platform. A PM asks you to:
-
Define metrics for “engagement”
(they want a clear metric framework they can use in experiments).
-
Analyze
user comment distribution
and propose how you would monitor it over time.
Part A — Engagement metric framework
Propose:
-
Primary metric(s)
(what you would optimize)
-
Diagnostic metrics
(to explain movement)
-
Guardrail metrics
(to prevent harmful changes)
Be explicit about:
-
Unit of analysis (user-day, session, video-view, etc.)
-
How you’d handle heavy users / skew (mean vs median, winsorization, log transforms)
-
How you’d prevent gaming (spammy/low-quality engagement)
Comments are known to be heavy-tailed (most users comment rarely; a small minority comment a lot).
Describe:
-
What distributions you would compute (by user, by video, by cohort)
-
What slices you would look at (new vs returning users, content categories, geos)
-
How you would detect regressions or anomalies (e.g., bots, spam, ranking changes)
-
What experiment you would run if the goal is to increase “healthy” commenting, including key confounders and how you’d interpret results