Choose robust metrics for skewed comments

Q: Choose robust metrics for skewed comments

This question evaluates understanding of robust estimation and inference for zero‑inflated, heavy‑tailed count data, including central tendency choices (mean, median, trimmed and winsorized means, geometric mean), nonparametric bootstrap confidence intervals, and robust effect‑size transformations.

Q: How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

Question

Robust central tendency and inference for zero‑inflated, heavy‑tailed counts

You are evaluating an A/B test on per‑user daily comment counts. The outcome is highly skewed and zero‑inflated (many users post 0; a few post a lot). You rolled out a backend optimization expected to increase engagement.

Answer the following about choosing robust estimators, computing them on a toy example, and forming intervals/effect sizes.

(a) Estimator choice under heavy tails and zero inflation

Explain when each of the following is preferable as a measure of central tendency for such data. Discuss bias/variance trade‑offs under heavy tails (e.g., Pareto) and interpretability for product decisions.

Mean
Median
10% trimmed mean
Winsorized mean (95/5)
Geometric mean of (1 + count) − 1

(b) Compute on toy samples and choose an estimator

Given per‑user counts for one day:

Control: [0, 0, 0, 1, 1, 2, 2, 3, 20, 50]
Treatment: [0, 0, 1, 1, 1, 2, 2, 3, 5, 10]

Compute for each arm: mean, median, 10% trimmed mean, and winsorized mean (95/5). Then state which estimator would most reliably detect a practically meaningful improvement here, and justify.

Conventions

10% trimmed mean: remove the lowest and highest 10% of observations (for n=10, drop 1 from each tail).
95/5 winsorized mean: cap values below the 5th percentile at the 5th‑percentile value and values above the 95th percentile at the 95th‑percentile value. For n=10, this effectively replaces the min and max with the 2nd smallest and 2nd largest values.

(c) 95% CI via stratified nonparametric bootstrap

Describe how to form a 95% confidence interval for your chosen estimator using a nonparametric bootstrap with stratification by user activity buckets. State assumptions and how you would check them.

(d) Robust, comparable effect size

If you must report an effect size that is robust yet comparable across experiments, propose a transformation and effect metric (e.g., log1p‑based percent change or a quantile treatment effect at τ = 0.8) and defend your choice.

Choose robust metrics for skewed comments

Robust central tendency and inference for zero‑inflated, heavy‑tailed counts

(a) Estimator choice under heavy tails and zero inflation

(b) Compute on toy samples and choose an estimator

(c) 95% CI via stratified nonparametric bootstrap

(d) Robust, comparable effect size

Solution

Comments (0)

Choose robust metrics for skewed comments

Overview

Robust central tendency and inference for zero‑inflated, heavy‑tailed counts

(a) Estimator choice under heavy tails and zero inflation

(b) Compute on toy samples and choose an estimator

(c) 95% CI via stratified nonparametric bootstrap

(d) Robust, comparable effect size

Solution

Comments (0)