Paired Comparison of Two Classifiers via McNemar's Test
You evaluated two classifiers on the same 10,000 labeled examples and summarized the paired outcomes in a 2×2 table:
-
Both correct: n11 = 8,740
-
Both wrong: n00 = 740
-
A correct / B wrong: n10 = 300
-
A wrong / B correct: n01 = 220
Let b = n10 (A correct, B wrong) and c = n01 (A wrong, B correct).
Answer the following:
-
Using McNemar's test with continuity correction, test H0: the error rates are equal for A and B. Compute and show the intermediate numbers (b, c, |b − c|, b + c), the test statistic, and the p-value.
-
Compute the exact binomial two-sided p-value for the same H0 by conditioning on b + c. Explain when you would prefer the exact test over the asymptotic McNemar test.
-
Provide a 95% confidence interval for the paired accuracy difference (A − B). State which method you use and why.
-
Discuss the assumptions behind McNemar's test, when it is inappropriate, and how you would adjust for multiple testing if comparing A against 10 other models.