PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Statistics & Math/Meta

Choose robust metrics for skewed comments

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of robust estimation and inference for zero‑inflated, heavy‑tailed count data, including central tendency choices (mean, median, trimmed and winsorized means, geometric mean), nonparametric bootstrap confidence intervals, and robust effect‑size transformations.

  • hard
  • Meta
  • Statistics & Math
  • Data Scientist

Choose robust metrics for skewed comments

Company: Meta

Role: Data Scientist

Category: Statistics & Math

Difficulty: hard

Interview Round: Onsite

A website’s per-user daily comment counts are extremely skewed and zero-inflated. You roll out a backend optimization expected to increase engagement. (a) Explain when mean, median, trimmed mean (10%), winsorized mean (95/5), and geometric mean of (1+count)−1 are preferable estimators of central tendency for such data. Discuss bias/variance trade-offs under heavy tails (e.g., Pareto) and interpretability for product decisions. (b) Suppose the control group’s per-user counts for a day are [0,0,0,1,1,2,2,3,20,50] and treatment’s are [0,0,1,1,1,2,2,3,5,10]. Compute mean, median, 10% trimmed mean, and winsorized mean for each, and determine which estimator most reliably detects a practically meaningful improvement here. Justify rigorously. (c) Describe how you would form a 95% CI for your chosen estimator using nonparametric bootstrap with stratification by user activity buckets. State assumptions and how you’d check them. (d) If you must report an effect size that’s robust but comparable across experiments, propose a transformation and effect metric (e.g., log1p-based percent change or quantile treatment effect at τ=0.8) and defend its choice.

Quick Answer: This question evaluates understanding of robust estimation and inference for zero‑inflated, heavy‑tailed count data, including central tendency choices (mean, median, trimmed and winsorized means, geometric mean), nonparametric bootstrap confidence intervals, and robust effect‑size transformations.

Related Interview Questions

  • Compute probability an account is fake - Meta (easy)
  • Compute Bayes probability for fake accounts - Meta (easy)
  • Compute probabilities for chatbot response quality - Meta (easy)
  • Compute posterior fake probability using Bayes' rule - Meta (medium)
  • Estimate bots and CI from DAU spike - Meta (medium)
Meta logo
Meta
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Statistics & Math
8
0

Robust central tendency and inference for zero‑inflated, heavy‑tailed counts

You are evaluating an A/B test on per‑user daily comment counts. The outcome is highly skewed and zero‑inflated (many users post 0; a few post a lot). You rolled out a backend optimization expected to increase engagement.

Answer the following about choosing robust estimators, computing them on a toy example, and forming intervals/effect sizes.

(a) Estimator choice under heavy tails and zero inflation

Explain when each of the following is preferable as a measure of central tendency for such data. Discuss bias/variance trade‑offs under heavy tails (e.g., Pareto) and interpretability for product decisions.

  • Mean
  • Median
  • 10% trimmed mean
  • Winsorized mean (95/5)
  • Geometric mean of (1 + count) − 1

(b) Compute on toy samples and choose an estimator

Given per‑user counts for one day:

  • Control: [0, 0, 0, 1, 1, 2, 2, 3, 20, 50]
  • Treatment: [0, 0, 1, 1, 1, 2, 2, 3, 5, 10]

Compute for each arm: mean, median, 10% trimmed mean, and winsorized mean (95/5). Then state which estimator would most reliably detect a practically meaningful improvement here, and justify.

Conventions

  • 10% trimmed mean: remove the lowest and highest 10% of observations (for n=10, drop 1 from each tail).
  • 95/5 winsorized mean: cap values below the 5th percentile at the 5th‑percentile value and values above the 95th percentile at the 95th‑percentile value. For n=10, this effectively replaces the min and max with the 2nd smallest and 2nd largest values.

(c) 95% CI via stratified nonparametric bootstrap

Describe how to form a 95% confidence interval for your chosen estimator using a nonparametric bootstrap with stratification by user activity buckets. State assumptions and how you would check them.

(d) Robust, comparable effect size

If you must report an effect size that is robust yet comparable across experiments, propose a transformation and effect metric (e.g., log1p‑based percent change or a quantile treatment effect at τ = 0.8) and defend your choice.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Statistics & Math•More Meta•More Data Scientist•Meta Data Scientist•Meta Statistics & Math•Data Scientist Statistics & Math
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.