This question evaluates a data scientist's competency in statistical experimental design and inference, covering sample size calculation for proportions, two-sided hypothesis testing and confidence intervals, issues from drift and seasonality, control of Type I error under peeking, and precision estimation methods like the delta method and bootstrap. It is commonly asked to assess understanding of A/B test power and error control in the Statistics & Math domain and tests both conceptual understanding and practical application of statistical methods.
Context: You are evaluating a new classifier that aims to reduce the spam rate (proportion of emails incorrectly flagged as spam) from 2.0% to 1.8% — a 10% relative reduction. You will run a two-arm randomized bucket test (equal allocation) and use a two-sided z-test for proportions.
Tasks:
Login required