Analyze skewed comments and sampling effects
Company: Meta
Role: Data Scientist
Category: Statistics & Math
Difficulty: medium
Interview Round: Onsite
A PM asks about the daily distribution of user comments per user, which is right-skewed. Answer the following:
1) Position statistics on a right-skewed distribution: Given the 20-user sample of daily comments [0,0,0,1,1,1,2,2,3,3,3,4,4,5,6,7,10,15,20,40], compute the mean, median, and 95th percentile (p95). Place all three on a number line and explain why mean > median > mode typically holds in right-skewed count data. Briefly note why the median and p95 can be integers.
2) Sampling distribution of group averages: Assume the true individual-level population has mean μ = 3.0 comments and standard deviation σ = 4.0 comments. If you sample n = 50 users and compute each group’s average comments, specify the approximate distribution of that average (by the CLT), its mean, its standard error, and its p95. Explain qualitatively how this p95 compares to the individual-level p95 from part (1).
3) Effect of larger samples: If n increases to 400, recompute the standard error and describe how the locations of the mean, median, and p95 of the sampling distribution change relative to one another as n grows.
Quick Answer: This question evaluates understanding of descriptive statistics for right‑skewed count data and inferential concepts such as sampling distributions, standard error, percentiles, and the Central Limit Theorem, and is commonly asked to probe how skewness and sample size affect central tendency measures and uncertainty in estimates.