Compute robust inference under skew and outliers

Q: Compute robust inference under skew and outliers

This question evaluates a data scientist's competency in robust statistical inference for A/B testing, covering handling of skewed continuous outcomes, extreme outliers, heteroskedasticity, missingness, multiplicity, selection between test statistics, construction of confidence intervals (including bootstrap and transformed CIs), power approximation, and robust estimators like trimmed means or M‑estimators. It is commonly asked in the statistics and experimental-design domain because it probes both conceptual understanding of robustness and multiple-testing principles and practical application skills in selecting appropriate inference methods, computing intervals and power under realistic data issues, and interpreting variance-stabilizing transformations.

Q: How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

Question

Loading...

A/B test with skew, outliers, heteroskedasticity, missingness, and multiplicity

You are comparing two independent product variants that produce a continuous KPI (session revenue in USD). The distribution is right‑skewed with about 5% extreme outliers.

Estimated from historical data:

Variant A: mean μ_A ≈ 2.30, sd σ_A ≈ 1.10
Variant B: mean μ_B ≈ 2.35, sd σ_B ≈ 1.40

Planned sample sizes:

n_A = 8,000
n_B = 7,500, with an expected 10% missing completely at random (MCAR) in B.

Tasks

Test choice: Decide whether to use a Welch t-test on means, a Mann–Whitney test on distributions, or a trimmed-mean test. Justify based on robustness to skew/outliers and on the target parameter.
95% CI for the mean difference: Compute a 95% CI for (μ_B − μ_A) using Welch’s t (show the Satterthwaite df). Also describe how to compute a nonparametric bootstrap percentile 95% CI with 10,000 resamples. State pros/cons of each approach.
Power: Approximate the power to detect a true mean lift of +0.05 at α = 0.05 under heteroskedasticity (use Welch variance) and adjust for the 10% MCAR in B.
Robust estimation: Propose an outlier‑robust estimator (e.g., a 20% trimmed mean or Huber M‑estimator) and compute a 98% CI for the trimmed‑mean difference. Explain how you would choose the trimming proportion.
Multiple testing: Suppose you track 20 related KPIs. Describe and compute the Benjamini–Hochberg FDR control at q = 0.10 given sorted p-values p_(1) ≤ … ≤ p_(20), and contrast with Bonferroni in terms of power and Type I error.
Variance‑stabilizing transform: If normality is doubtful, show how to use Y' = log1p(Y) and back‑transform the CI for an interpretable multiplicative effect, noting the bias correction needed when exponentiating.

Compute robust inference under skew and outliers

A/B test with skew, outliers, heteroskedasticity, missingness, and multiplicity

Solution

Comments (0)

Compute robust inference under skew and outliers

Overview

A/B test with skew, outliers, heteroskedasticity, missingness, and multiplicity

Solution

Comments (0)