This question evaluates a data scientist's ability to characterize count data distributions, recognize features such as zero-inflation and heavy tails, and reason about appropriate statistical families and validation approaches.

You are analyzing the number of comments made by each user over a fixed time window (e.g., 30 days). Each user contributes a non-negative integer count (0, 1, 2, ...). Many users may make no comments in the window, while a small fraction may be very active.
Describe or sketch the empirical distribution of per-user comment counts and justify the distributional shape you expect to observe. Briefly note a reasonable statistical family to model it and how you would validate that choice.
Login required