Mixture Spike and Mean-Difference Inference for Daily Comments
Context
A product has DAU (daily active users) = 2,000,000. On day T, total comments increased by 20%. Historically, the per-user mean is 5.5 comments; on day T, the mean is 6.6 (= 1.2 × 5.5). You suspect a small fraction of bot accounts that each posted around 500 comments on day T.
Tasks
-
Two-component mixture model: assume humans remained at 5.5 comments/user and bots posted 500 comments/user. Estimate the number and fraction of bot accounts on day T. Show formulas and a numeric result.
-
Re-estimate if the human mean drifted up to 5.8 on day T (bots still at 500).
-
From day-T DAU, two independent simple random samples (SRS) of size 1% are drawn (Sample A and Sample B). Compute a 95% confidence interval (CI) for the difference of sample means, mean(A) − mean(B):
a) assuming per-user comments are Poisson with mean 6.6.
b) assuming Negative Binomial with mean 6.6 and dispersion k = 2 (use Var = μ + μ²/k).
State and justify approximations (e.g., CLT). You may comment on finite population correction (FPC) if relevant.
-
If the true distribution is heavy-tailed with ~1% of humans having >50 comments, discuss when the normal approximation can break and outline a nonparametric bootstrap procedure to obtain a CI for mean(A) − mean(B).
-
Briefly critique assuming normality and propose a quick diagnostic using only sample aggregates (n, mean, variance).