Power, MDE, And Multiple Testing
Asked of: Data Scientist
Last updated

What's being tested
Ability to design and interpret online experiments: compute/sample-size choices for a target MDE, tradeoffs between alpha/power/sample size, and correct handling of multiple comparisons to control false positives.
Core knowledge
- Power = 1 − β; typically target 80%–90% power; α commonly 0.05 (one- or two-sided matters).
- Two-sample MDE formula: N ≈ (Z1−α/2 + Z1−β)^2 * (σ1^2+σ2^2) / Δ^2.
- For proportions use pooled variance p̄(1−p̄); express Δ as absolute or relative lift explicitly.
- Bonferroni: α_family/ k controls FWER; conservative when tests are correlated.
- Benjamini–Hochberg controls FDR; more power when tolerating some false discoveries.
- Sequential peeking requires alpha-spending (Pocock/O’Brien–Fleming) or group-sequential correction.
- Define test family up-front; correlated metrics, denominators, and unit of analysis matter.
Worked example (compute MDE for conversion rate A/B test)
Start by stating the metric precisely: per-user conversion rate, baseline p0, absolute vs relative lift. Choose α (two-sided?) and target power. Compute pooled variance p̄(1−p̄) with p̄≈p0 for small effects, then plug into the sample size formula to solve for N per arm. Finally, discuss practical constraints: if required N exceeds traffic, either increase MDE, lower power, extend test duration, or change metric aggregation (e.g., per-session vs per-user). Mention multiple testing adjustments if this test is one of many (adjust α or plan BH).
A common pitfall
A tempting but wrong approach is to fix a tiny relative lift (e.g., 1%) without checking variance or traffic, then blame “no significant result.” That ignores underpowered designs, different denominators, and multiple comparisons. Also avoid blindly applying Bonferroni across correlated metrics — it kills power; instead predefine families or use FDR when appropriate.
Further reading
- Kohavi, Tang, and Xu, Trustworthy Online Controlled Experiments (O’Reilly) — practical guidance for online A/B testing.
- Benjamini & Hochberg, “Controlling the False Discovery Rate” (1995) — FDR procedure fundamentals.
Related concepts
- Statistical Power AnalysisStatistics & Math
- Statistical Inference, Hypothesis Testing, And Power
- Statistical Inference, Hypothesis Tests, And Power
- Hypothesis Testing, Power, And Confidence Intervals
- Statistical Inference, Power, And Metric UncertaintyStatistics & Math
- Experiment Diagnostics, Power And Robust InferenceStatistics & Math