Daily comments spiked by 20% on day T for a product with DAU = 2,000,000. Historically, per-user comments are ~5.5 on normal days; on day T the mean is 6.6. Suspect bots averaging 500 comments each that day. 1) Under a simple two-component mixture (humans unchanged at 5.5, bots at 500), estimate the number and fraction of bot accounts present on day T; show formulas and a numeric result. 2) If the human mean also drifted to 5.8 that day, re-estimate the bot count. 3) From the day-T DAU, two independent 1% simple random samples are drawn. Compute a 95% confidence interval for the difference of their sample means (Sample A mean minus Sample B mean): (a) assuming per-user comments are Poisson with mean 6.6; (b) assuming Negative Binomial with mean 6.6 and dispersion k = 2. State any approximations (e.g., CLT) and justify them. 4) If the true distribution is heavy-tailed with ~1% of humans having >50 comments, discuss when the normal approximation breaks and outline a bootstrap procedure for the CI. 5) Briefly critique the initial normality assumption and propose a quick diagnostic using only sample aggregates (n, mean, variance).

This question evaluates proficiency in mixture modeling for anomaly detection, parametric and nonparametric inference for mean differences, handling overdispersed count data, and bootstrap resampling under heavy-tailed distributions.

How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

What difficulty level is this interview question?

This is a medium difficulty Statistics & Math question, commonly asked during Onsite rounds at Meta.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Meta during technical interviews.

Estimate bots and CI from DAU spike | Meta Interview Question

Mixture Spike and Mean-Difference Inference for Daily Comments

Context

A product has DAU (daily active users) = 2,000,000. On day T, total comments increased by 20%. Historically, the per-user mean is 5.5 comments; on day T, the mean is 6.6 (= 1.2 × 5.5). You suspect a small fraction of bot accounts that each posted around 500 comments on day T.

Tasks

Two-component mixture model: assume humans remained at 5.5 comments/user and bots posted 500 comments/user. Estimate the number and fraction of bot accounts on day T. Show formulas and a numeric result.
Re-estimate if the human mean drifted up to 5.8 on day T (bots still at 500).
From day-T DAU, two independent simple random samples (SRS) of size 1% are drawn (Sample A and Sample B). Compute a 95% confidence interval (CI) for the difference of sample means, mean(A) − mean(B): a) assuming per-user comments are Poisson with mean 6.6. b) assuming Negative Binomial with mean 6.6 and dispersion k = 2 (use Var = μ + μ²/k). State and justify approximations (e.g., CLT). You may comment on finite population correction (FPC) if relevant.
If the true distribution is heavy-tailed with ~1% of humans having >50 comments, discuss when the normal approximation can break and outline a nonparametric bootstrap procedure to obtain a CI for mean(A) − mean(B).
Briefly critique assuming normality and propose a quick diagnostic using only sample aggregates (n, mean, variance).

Tasks

Two-component mixture model: assume humans remained at 5.5 comments/user and bots posted 500 comments/user. Estimate the number and fraction of bot accounts on day T. Show formulas and a numeric result.

Re-estimate if the human mean drifted up to 5.8 on day T (bots still at 500).

From day-T DAU, two independent simple random samples (SRS) of size 1% are drawn (Sample A and Sample B). Compute a 95% confidence interval (CI) for the difference of sample means, mean(A) − mean(B): a) assuming per-user comments are Poisson with mean 6.6. b) assuming Negative Binomial with mean 6.6 and dispersion k = 2 (use Var = μ + μ²/k). State and justify approximations (e.g., CLT). You may comment on finite population correction (FPC) if relevant.

If the true distribution is heavy-tailed with ~1% of humans having >50 comments, discuss when the normal approximation can break and outline a nonparametric bootstrap procedure to obtain a CI for mean(A) − mean(B).

Briefly critique assuming normality and propose a quick diagnostic using only sample aggregates (n, mean, variance).

Estimate bots and CI from DAU spike

Quick Overview

Mixture Spike and Mean-Difference Inference for Daily Comments

Context

Tasks

Solution

Submit Your Answer to Earn 20XP

Estimate bots and CI from DAU spike

Quick Overview

Mixture Spike and Mean-Difference Inference for Daily Comments

Context

Tasks

Solution

Submit Your Answer to Earn 20XP