Modeling Heavy-Tailed Comment Counts and Robust Monitoring
You are analyzing daily comment counts at the post–day level. The distribution is heavy-tailed. From a recent period you observe:
-
Sample mean = 4
-
Sample variance = 50
Tasks:
(a) Test whether a Poisson model is appropriate. If not, propose an alternative (negative binomial or discrete lognormal/Poisson–lognormal) and outline how to estimate parameters.
(b) Compare model fits using likelihood-based tests (likelihood ratio for nested models; Vuong test for non-nested models). Explain which tail behavior each model captures.
(c) Define robust monitoring metrics (e.g., trimmed mean, P50/median, Gini) and specify control limits suitable for detecting manipulation (e.g., purchased comments) under heavy tails.
(d) Suppose the daily 99th percentile (P99) suddenly increases 3× while the median (P50) remains stable. Propose a practical rule to trigger investigation while minimizing false alarms.