Power, MDE, And Multiple Testing — Tech Interview Concept

What's being tested
Ability to design and interpret online experiments: compute/sample-size choices for a target MDE, tradeoffs between alpha/power/sample size, and correct handling of multiple comparisons to control false positives.

Core knowledge

Power = 1 − β; typically target 80%–90% power; α commonly 0.05 (one- or two-sided matters).
Two-sample MDE formula: N ≈ (Z1−α/2 + Z1−β)^2 * (σ1^2+σ2^2) / Δ^2.
For proportions use pooled variance p̄(1−p̄); express Δ as absolute or relative lift explicitly.
Bonferroni: α_family/ k controls FWER; conservative when tests are correlated.
Benjamini–Hochberg controls FDR; more power when tolerating some false discoveries.
Sequential peeking requires alpha-spending (Pocock/O’Brien–Fleming) or group-sequential correction.
Define test family up-front; correlated metrics, denominators, and unit of analysis matter.

Worked example (compute MDE for conversion rate A/B test)
Start by stating the metric precisely: per-user conversion rate, baseline p0, absolute vs relative lift. Choose α (two-sided?) and target power. Compute pooled variance p̄(1−p̄) with p̄≈p0 for small effects, then plug into the sample size formula to solve for N per arm. Finally, discuss practical constraints: if required N exceeds traffic, either increase MDE, lower power, extend test duration, or change metric aggregation (e.g., per-session vs per-user). Mention multiple testing adjustments if this test is one of many (adjust α or plan BH).

A common pitfall
A tempting but wrong approach is to fix a tiny relative lift (e.g., 1%) without checking variance or traffic, then blame “no significant result.” That ignores underpowered designs, different denominators, and multiple comparisons. Also avoid blindly applying Bonferroni across correlated metrics — it kills power; instead predefine families or use FDR when appropriate.

Further reading

Kohavi, Tang, and Xu, Trustworthy Online Controlled Experiments (O’Reilly) — practical guidance for online A/B testing.
Benjamini & Hochberg, “Controlling the False Discovery Rate” (1995) — FDR procedure fundamentals.

Related concepts