A/B Testing: p-values, Power, and Error Rates with Multiple Comparisons
Context
You are reviewing the results of an online A/B experiment. Stakeholders question whether your findings are statistically valid, especially because you track several metrics and may have more than two variants.
Task
-
Define the following in the context of A/B testing:
-
p-value
-
Type I error
-
Type II error
-
Statistical power
-
Explain why tracking multiple metrics and/or testing multiple variants inflates false positives and requires corrections (e.g., Bonferroni).
-
Demonstrate with a concrete numerical example how family-wise error rate (FWER) grows with the number of tests and how Bonferroni controls it.