How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

What difficulty level is this interview question?

This is a medium difficulty Statistics & Math question, commonly asked during Onsite rounds at TikTok.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at TikTok during technical interviews.

Model overdispersed counts; estimate treatment lift

Quick Overview

This question evaluates modeling and inference for overdispersed, zero‑inflated count data, including estimation of treatment lift (rate ratios), dispersion assessment, standard error quantification, cluster-robust inference, bootstrap resampling, and multiple-comparison correction.

Weekly posts per creator are overdispersed and zero‑inflated. In a creator‑level randomized test of a nudge:

Control: n_c=40,000 creators, total posts=72,000 (mean=1.8)
Treatment: n_t=40,000 creators, total posts=75,600 (mean=1.89)
Historical control variance per creator s_c^2≈6.5 (suggesting overdispersion).

Answer:

Choose an appropriate model (e.g., Negative Binomial with log link). Using var( Y ) = μ + μ^2/k, estimate k from the control statistics and compute the estimated log rate ratio, its standard error, and a 95% CI for the treatment lift.
If you instead used a Poisson model, quantify the expected underestimation of SE relative to the NB and discuss when that would inflate Type I error.
Outline a cluster‑robust approach if randomization had been by geo (state/clusters), and a nonparametric bootstrap you’d trust here. Be explicit about the resampling unit and how you’d construct the CI for the rate ratio.
Given meaningful heterogeneity by creator tenure, propose a pre‑specified analysis (e.g., stratified NB or interaction terms) and how you’d correct for multiple comparisons across 10 geos (e.g., BH‑FDR).

Quick Overview

Weekly posts per creator are overdispersed and zero‑inflated. In a creator‑level randomized test of a nudge:

Control: n_c=40,000 creators, total posts=72,000 (mean=1.8)
Treatment: n_t=40,000 creators, total posts=75,600 (mean=1.89)
Historical control variance per creator s_c^2≈6.5 (suggesting overdispersion).

Answer:

Choose an appropriate model (e.g., Negative Binomial with log link). Using var( Y ) = μ + μ^2/k, estimate k from the control statistics and compute the estimated log rate ratio, its standard error, and a 95% CI for the treatment lift.
If you instead used a Poisson model, quantify the expected underestimation of SE relative to the NB and discuss when that would inflate Type I error.
Outline a cluster‑robust approach if randomization had been by geo (state/clusters), and a nonparametric bootstrap you’d trust here. Be explicit about the resampling unit and how you’d construct the CI for the rate ratio.
Given meaningful heterogeneity by creator tenure, propose a pre‑specified analysis (e.g., stratified NB or interaction terms) and how you’d correct for multiple comparisons across 10 geos (e.g., BH‑FDR).

Model overdispersed counts; estimate treatment lift

Quick Overview

Model overdispersed counts; estimate treatment lift

Write your answer

Model overdispersed counts; estimate treatment lift

Quick Overview

Model overdispersed counts; estimate treatment lift

Write your answer