PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Statistics & Math/Meta

Analyze DAU comments distribution and resampling

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in statistical modeling of count data, resampling and bootstrap inference, summary-statistic interpretation, and numeric aggregation/stability considerations within the Statistics & Math domain for a data scientist role.

  • Medium
  • Meta
  • Statistics & Math
  • Data Scientist

Analyze DAU comments distribution and resampling

Company: Meta

Role: Data Scientist

Category: Statistics & Math

Difficulty: Medium

Interview Round: Onsite

Consider the metric comments_per_DAU (number of comments a daily active user makes in a day). a) Shape: Describe and justify the expected distribution of comments_per_DAU across users on a given day (e.g., zero-inflation, skew/heavy tail). Is the variable discrete or continuous? What are reasonable parametric families to consider (e.g., Poisson vs Negative Binomial), and why might Poisson be inadequate? b) Bootstrapping: You repeatedly resample n=10,000 users with replacement from that day’s user list and compute the sample mean, repeating this 100,000 times. Describe the bootstrap distribution’s shape and center. Under what conditions will it be approximately normal, and when might it remain skewed? What is the relationship between its standard deviation and the population variance σ²? c) Scaling n: If you increase n from 10,000 to 20,000, how (quantitatively) does the width of the bootstrap distribution of the mean change? State the factor and the intuition. d) Summary stats: For this metric, compare mean, median, mode, and p95. Which is most stable, which is most decision-relevant, and why might the mode be 0? How do you interpret and compute p95 for a discrete count variable (e.g., tie handling, integer vs real thresholds)? e) Data types and aggregation: The per-user value is an integer, but the mean across users is a real number. Explain pitfalls from storing as integer vs float at different aggregation levels (e.g., truncation, rounding bias, overflow) and how you’d ensure numeric stability when computing large-day aggregates. f) Estimation: Suppose the per-user variance is overdispersed (Var > Mean). Write the approximate standard error of the sample mean and discuss when you’d prefer robust estimators (trimmed mean, Winsorization) or variance reduction techniques (CUPED with a prior-day covariate).

Quick Answer: This question evaluates a candidate's competency in statistical modeling of count data, resampling and bootstrap inference, summary-statistic interpretation, and numeric aggregation/stability considerations within the Statistics & Math domain for a data scientist role.

Related Interview Questions

  • Compute probability an account is fake - Meta (easy)
  • Compute Bayes probability for fake accounts - Meta (easy)
  • Compute probabilities for chatbot response quality - Meta (easy)
  • Compute posterior fake probability using Bayes' rule - Meta (medium)
  • Estimate bots and CI from DAU spike - Meta (medium)
Meta logo
Meta
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Statistics & Math
5
0

Consider the metric comments_per_DAU (number of comments a daily active user makes in a day).

a) Shape: Describe and justify the expected distribution of comments_per_DAU across users on a given day (e.g., zero-inflation, skew/heavy tail). Is the variable discrete or continuous? What are reasonable parametric families to consider (e.g., Poisson vs Negative Binomial), and why might Poisson be inadequate?

b) Bootstrapping: You repeatedly resample n=10,000 users with replacement from that day’s user list and compute the sample mean, repeating this 100,000 times. Describe the bootstrap distribution’s shape and center. Under what conditions will it be approximately normal, and when might it remain skewed? What is the relationship between its standard deviation and the population variance σ²?

c) Scaling n: If you increase n from 10,000 to 20,000, how (quantitatively) does the width of the bootstrap distribution of the mean change? State the factor and the intuition.

d) Summary stats: For this metric, compare mean, median, mode, and p95. Which is most stable, which is most decision-relevant, and why might the mode be 0? How do you interpret and compute p95 for a discrete count variable (e.g., tie handling, integer vs real thresholds)?

e) Data types and aggregation: The per-user value is an integer, but the mean across users is a real number. Explain pitfalls from storing as integer vs float at different aggregation levels (e.g., truncation, rounding bias, overflow) and how you’d ensure numeric stability when computing large-day aggregates.

f) Estimation: Suppose the per-user variance is overdispersed (Var > Mean). Write the approximate standard error of the sample mean and discuss when you’d prefer robust estimators (trimmed mean, Winsorization) or variance reduction techniques (CUPED with a prior-day covariate).

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Statistics & Math•More Meta•More Data Scientist•Meta Data Scientist•Meta Statistics & Math•Data Scientist Statistics & Math
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.