Explain P-Value, Confidence Interval, and Multiple Testing Adjustments
You are running online A/B experiments to evaluate a new product launch. Assume randomized assignment and a binary primary metric such as conversion unless the interviewer states otherwise.
Constraints & Assumptions
-
Use practical A/B testing examples, not only textbook definitions.
-
Distinguish statistical significance from practical significance.
-
Include assumptions behind each test and adjustment method.
-
Explain common pitfalls clearly.
Clarifying Questions to Ask
-
Is the test one-sided or two-sided?
-
Is the primary metric binary, continuous, count-based, or ratio-based?
-
How many metrics, variants, and pairwise comparisons are being tested?
-
Are users independent, or are there clusters or repeated measurements?
Part 1 - P-Value and Confidence Interval
Define the p-value and confidence interval, and explain their relationship.
What This Part Should Cover
-
P-value as probability of data at least as extreme under the null.
-
Confidence interval as a range produced by a procedure with long-run coverage.
-
Relationship between a two-sided test and whether a confidence interval excludes the null value.
-
Common misinterpretations.
Part 2 - Multiple Testing Adjustments
How do you adjust for multiple testing? Contrast Bonferroni and Tukey's HSD, and note when you would use each.
What This Part Should Cover
-
Family-wise error rate and why multiple comparisons inflate false positives.
-
Bonferroni as simple and conservative across planned tests.
-
Tukey's HSD for all pairwise comparisons after ANOVA-style comparisons of group means.
-
Mention false discovery rate methods when many exploratory metrics are involved.
Part 3 - Type I and Type II Errors
Explain Type I and Type II errors with concrete A/B testing examples.
What This Part Should Cover
-
Type I error as launching a feature that has no real lift.
-
Type II error as missing a real improvement.
-
Role of alpha, power, sample size, variance, and minimum detectable effect.
Part 4 - Z-Test Versus T-Test
When would you use a Z-test versus a t-test?
What This Part Should Cover
-
Z-test for large samples or known variance, common for large-scale binary metrics via normal approximation.
-
T-test for continuous metrics with unknown variance, especially smaller samples.
-
Assumptions and robust alternatives.
Part 5 - CLT Versus LLN
Compare the Central Limit Theorem with the Law of Large Numbers and explain practical implications for experiment analysis.
What This Part Should Cover
-
LLN as sample averages converging to expected values.
-
CLT as standardized sample averages becoming approximately normal.
-
How these justify metric estimation and confidence intervals in large experiments.
What a Strong Answer Covers
A strong answer gives accurate definitions, links inference concepts to A/B testing decisions, controls false positives across multiple comparisons, and explains when approximations are valid.
Follow-up Questions
-
How would you handle many secondary metrics?
-
What if the p-value is significant but the effect size is tiny?
-
How would clustering or repeated users change the analysis?