This question evaluates proficiency in statistical inference for proportions, covering hypothesis testing, confidence intervals, power and sample-size calculations, multiple-testing correction, and comparison of frequentist versus Bayesian approaches within the Statistics & Math domain for data scientist roles.

Two search models, A and B, were each used once by 100 distinct users (one query per user). Success is defined per query by your composite metric (success=1, failure=0). Model A had 90 successes, Model B had 85. Using a two-sided test at alpha=0.05: 1) State H0 and H1, choose the appropriate test (pooled two-proportion z-test), compute the test statistic and p-value, and conclude whether A outperforms B. 2) Compute a 95% confidence interval for pA−pB and interpret it for practical significance. 3) What per-arm sample size is needed to detect a +5 percentage-point uplift (baseline 85%) with 80% power at alpha=0.05? Show formulas/inputs. 4) If you simultaneously test these two models across 10 independent intents, apply a Bonferroni correction and say whether your conclusion changes. 5) Briefly explain when you would prefer an exact test or a Bayesian comparison and what you would report in each case.