Daily Comments per Active User: Sampling and Inference
You have, for a given day d, the count of comments made by each active user. Let there be m active users on day d, and let X_i be the number of comments for user i (i = 1, 2, …, m). Answer the following:
-
Define and compute the daily mean, median, and 95th percentile (P95). Provide explicit formulas, including how to compute P95 from a sorted empirical distribution.
-
Explain the Central Limit Theorem (CLT) and why the sampling distribution of the sample mean tends toward normality. State the required conditions.
-
Suppose you repeatedly take 200 independent simple random samples of size n from the same day’s user-level comments. What n is sufficient for the sample mean to be approximately normal, and how does the sampling distribution’s variance change with n?
-
Describe how the plots of the sampling distributions of the daily mean, median, and P95 evolve as n increases, and explain why.
-
If comments per user increased week-over-week, list at least three plausible statistical or product reasons and how you would test each.