A/B Test: Two-Proportion Z-Test for Success Rates
Scenario
You ran an A/B test comparing two large language models (LLMs):
-
Model A: 700 successes out of 1000 trials (p_A = 0.70)
-
Model B: 800 successes out of 1000 trials (p_B = 0.80)
Task
-
State the hypotheses to test whether Model B is better than Model A at α = 0.05.
-
Compute the two-proportion z-statistic (using the pooled standard error) and the corresponding p-value.
-
Decide if Model B is significantly better at α = 0.05.
-
Compute the 95% confidence interval for the lift (assume lift = p_B − p_A, the absolute difference in success rates).
Hint (pooled z-test):
-
z = (p_B − p_A) / sqrt(p*(1 − p)*(1/n_A + 1/n_B)), where p is the pooled proportion p = (x_A + x_B)/(n_A + n_B).