Choose tests and solve distribution parameters
Company: Meta
Role: Data Scientist
Category: Statistics & Math
Difficulty: hard
Interview Round: Onsite
You are comparing engagement between new and existing users from 2025-08-05 to 2025-09-01. 1) You observe per-user daily session counts (integer, skewed, with many zeros). Which test would you use to compare central tendency between cohorts and why: two-sample t-test, Welch's t-test, Mann–Whitney U, or a GLM-based approach? State assumptions and diagnostics you would run. 2) Suppose daily sessions per user is approximately Negative Binomial with mean μ = 2.40 and variance σ² = 6.96 for existing users. Parameterize NB in terms of (r, p) where E[X] = r(1−p)/p and Var[X] = r(1−p)/p². Solve for r and p, then compute P(X = 0). 3) For new users, you estimate μ = 1.85 and σ² = 4.20. Using a delta-method or GLM reasoning, give a 95% CI for the mean difference in sessions per user between cohorts given independent samples of size n_new = 5,000 and n_exist = 5,000. State any approximations. 4) You perform a Welch's t-test and obtain p = 0.04 with Cohen's d = 0.08. Interpret practical vs statistical significance, discuss multiple-testing control if you also segmented by 5 countries, and specify one robust effect-size metric for count data (e.g., ratio of means) and how to estimate its CI.
Quick Answer: This question evaluates proficiency in statistical inference for skewed count data, covering test selection for central tendency, negative binomial parameter estimation and zero-probability calculation, construction of confidence intervals for mean differences, and interpretation of p-values versus effect sizes within the Statistics & Math domain for a Data Scientist role. It is commonly asked to assess both conceptual understanding (assumptions, diagnostics, statistical versus practical significance, and multiple-testing considerations) and practical application (parameter solving, delta-method/GLM rationale, and robust effect-size estimation) when analyzing real-world count data.