Scenario
You are analyzing comments on a social media app. Each post i accrues a number of comments C_i over a fixed window (e.g., first 24 hours after posting). You want to know whether engagement is evenly distributed across posts or dominated by a small fraction of viral posts.
Task
Propose a quantitative approach to evaluate the distribution of comments per post:
-
Define the core variable(s) you will analyze and any normalizations (e.g., time window, exposure-adjusted rates).
-
Specify descriptive and concentration metrics you will compute, and why they are informative (e.g., Lorenz curve, Gini coefficient, top-k share).
-
Describe distributional models you would consider (e.g., Poisson, Negative Binomial, zero-inflated models, lognormal/power-law tails) and how you would check fit.
-
Choose statistical tests to assess goodness-of-fit and to compare distributions across cohorts or time.
-
Formulate clear hypotheses that distinguish "evenly spread" from "dominated by few" and how you would test them.
Assume you have large sample sizes and can segment by country, surface, or time period as needed.