This question evaluates understanding of robust estimation and inference for zero‑inflated, heavy‑tailed count data, including central tendency choices (mean, median, trimmed and winsorized means, geometric mean), nonparametric bootstrap confidence intervals, and robust effect‑size transformations.

You are evaluating an A/B test on per‑user daily comment counts. The outcome is highly skewed and zero‑inflated (many users post 0; a few post a lot). You rolled out a backend optimization expected to increase engagement.
Answer the following about choosing robust estimators, computing them on a toy example, and forming intervals/effect sizes.
Explain when each of the following is preferable as a measure of central tendency for such data. Discuss bias/variance trade‑offs under heavy tails (e.g., Pareto) and interpretability for product decisions.
Given per‑user counts for one day:
Compute for each arm: mean, median, 10% trimmed mean, and winsorized mean (95/5). Then state which estimator would most reliably detect a practically meaningful improvement here, and justify.
Conventions
Describe how to form a 95% confidence interval for your chosen estimator using a nonparametric bootstrap with stratification by user activity buckets. State assumptions and how you would check them.
If you must report an effect size that is robust yet comparable across experiments, propose a transformation and effect metric (e.g., log1p‑based percent change or a quantile treatment effect at τ = 0.8) and defend your choice.
Login required