Weekly posts per creator are overdispersed and zero‑inflated. In a creator‑level randomized test of a nudge:
-
Control: n_c=40,000 creators, total posts=72,000 (mean=1.8)
-
Treatment: n_t=40,000 creators, total posts=75,600 (mean=1.89)
-
Historical control variance per creator s_c^2≈6.5 (suggesting overdispersion).
Answer:
-
Choose an appropriate model (e.g., Negative Binomial with log link). Using var( Y ) = μ + μ^2/k, estimate k from the control statistics and compute the estimated log rate ratio, its standard error, and a 95% CI for the treatment lift.
-
If you instead used a Poisson model, quantify the expected underestimation of SE relative to the NB and discuss when that would inflate Type I error.
-
Outline a cluster‑robust approach if randomization had been by geo (state/clusters), and a nonparametric bootstrap you’d trust here. Be explicit about the resampling unit and how you’d construct the CI for the rate ratio.
-
Given meaningful heterogeneity by creator tenure, propose a pre‑specified analysis (e.g., stratified NB or interaction terms) and how you’d correct for multiple comparisons across 10 geos (e.g., BH‑FDR).