Compute robust inference under skew and outliers
Company: Voleon Group
Role: Data Scientist
Category: Statistics & Math
Difficulty: hard
Interview Round: Technical Screen
Two independent product variants A and B produce a continuous KPI (session revenue in USD) that is right-skewed with ~5% extreme outliers. From historical data you estimate: for A, mean μ_A ≈ 2.30 and sd σ_A ≈ 1.10; for B, mean μ_B ≈ 2.35 and sd σ_B ≈ 1.40. Planned sample sizes are n_A = 8,000 and n_B = 7,500, but you expect 10% missing outcomes in B that are MCAR. Tasks: (1) Decide whether to use a Welch t-test on means, a Mann–Whitney test on distributions, or a trimmed-mean test; justify based on robustness to skew/outliers and on what parameter you want to detect; (2) Compute a 95% CI for (μ_B − μ_A) using Welch’s t (show the Satterthwaite df) and also via a nonparametric bootstrap percentile CI with 10,000 resamples; state pros/cons of each; (3) Approximate the power to detect a true mean lift of +0.05 at α = 0.05 under heteroskedasticity (use Welch variance) and adjust for the 10% MCAR in B; (4) Propose an outlier-robust estimator (e.g., 20% trimmed mean or Huber M-estimator) and compute a 98% CI for the trimmed-mean difference; explain how you would choose the trimming proportion; (5) Suppose you track 20 related KPIs: describe and compute the Benjamini–Hochberg FDR control at q = 0.10 given a sorted p-value list p_(1) ≤ ... ≤ p_(20), and contrast with Bonferroni in terms of power and Type I error; (6) If normality is doubtful, show how to use a variance-stabilizing transform (e.g., Y' = log1p(Y)) and back-transform the CI for an interpretable multiplicative effect, noting the bias correction needed when exponentiating.
Quick Answer: This question evaluates a data scientist's competency in robust statistical inference for A/B testing, covering handling of skewed continuous outcomes, extreme outliers, heteroskedasticity, missingness, multiplicity, selection between test statistics, construction of confidence intervals (including bootstrap and transformed CIs), power approximation, and robust estimators like trimmed means or M‑estimators. It is commonly asked in the statistics and experimental-design domain because it probes both conceptual understanding of robustness and multiple-testing principles and practical application skills in selecting appropriate inference methods, computing intervals and power under realistic data issues, and interpreting variance-stabilizing transformations.