Quantify improvement and compute required sample size

Q: Quantify improvement and compute required sample size

This question evaluates a data scientist's competency in statistical experimental design and inference, covering sample size calculation for proportions, two-sided hypothesis testing and confidence intervals, issues from drift and seasonality, control of Type I error under peeking, and precision estimation methods like the delta method and bootstrap. It is commonly asked to assess understanding of A/B test power and error control in the Statistics & Math domain and tests both conceptual understanding and practical application of statistical methods.

Q: How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

Question

A/B Test on Spam Rate: Sample Size, Inference, and Practical Pitfalls

Context: You are evaluating a new classifier that aims to reduce the spam rate (proportion of emails incorrectly flagged as spam) from 2.0% to 1.8% — a 10% relative reduction. You will run a two-arm randomized bucket test (equal allocation) and use a two-sided z-test for proportions.

Tasks:

Sample size: Derive the per-arm sample size needed for α = 0.05 and power = 0.80 to detect a drop from 2.0% to 1.8%. Show the formula and clearly state your variance assumptions (pooled vs. unpooled).
Inference on observed data: If you observe control = 2.1% on 500,000 emails and treatment = 1.85% on 500,000 emails, compute the 95% confidence interval (CI) for the absolute difference (treatment − control) and the corresponding two-sided p-value.
Validity under drift/seasonality: Discuss the impact of class-imbalance drift and traffic seasonality on the test’s validity. Propose stratification/blocking to mitigate these issues.
Peeking and early stopping: If you peek daily and stop when p < 0.05, explain why Type I error inflates and outline a correction (e.g., alpha spending or group-sequential bounds).
Precision at fixed recall: If precision is business-critical, explain how to estimate a CI for precision at a fixed recall using the delta method or bootstrap, and when each is appropriate.

Quantify improvement and compute required sample size

A/B Test on Spam Rate: Sample Size, Inference, and Practical Pitfalls

Solution

Comments (0)

Quantify improvement and compute required sample size

Overview

A/B Test on Spam Rate: Sample Size, Inference, and Practical Pitfalls

Solution

Comments (0)