This question evaluates competency in statistical modeling of heavy-tailed count data, model selection and comparison, and the formulation of robust monitoring metrics for anomaly detection, testing both theoretical understanding and applied data-science skills.

You are analyzing daily comment counts at the post–day level. The distribution is heavy-tailed. From a recent period you observe:
Tasks:
(a) Test whether a Poisson model is appropriate. If not, propose an alternative (negative binomial or discrete lognormal/Poisson–lognormal) and outline how to estimate parameters.
(b) Compare model fits using likelihood-based tests (likelihood ratio for nested models; Vuong test for non-nested models). Explain which tail behavior each model captures.
(c) Define robust monitoring metrics (e.g., trimmed mean, P50/median, Gini) and specify control limits suitable for detecting manipulation (e.g., purchased comments) under heavy tails.
(d) Suppose the daily 99th percentile (P99) suddenly increases 3× while the median (P50) remains stable. Propose a practical rule to trigger investigation while minimizing false alarms.
Login required