PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Statistics & Math/Amazon

Quantify improvement and compute required sample size

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in statistical experimental design and inference, covering sample size calculation for proportions, two-sided hypothesis testing and confidence intervals, issues from drift and seasonality, control of Type I error under peeking, and precision estimation methods like the delta method and bootstrap. It is commonly asked to assess understanding of A/B test power and error control in the Statistics & Math domain and tests both conceptual understanding and practical application of statistical methods.

  • hard
  • Amazon
  • Statistics & Math
  • Data Scientist

Quantify improvement and compute required sample size

Company: Amazon

Role: Data Scientist

Category: Statistics & Math

Difficulty: hard

Interview Round: Technical Screen

You claim a new classifier reduces the spam rate from 2.0% to 1.8% (a 10% relative drop). For a two-arm randomized bucket test with alpha=0.05 and power=0.80, two-sided z-test on proportions: 1) Derive the per-arm sample size and show the formula you use (state any pooled/unequal-variance assumptions). 2) If you observe control=2.1% on 500k emails and treatment=1.85% on 500k, compute the 95% confidence interval for the absolute difference and the corresponding p-value. 3) Discuss the impact of class imbalance drift and traffic seasonality; propose stratification or blocking to preserve validity. 4) If you peek daily and stop early when p<0.05, explain why Type I error inflates and outline a correction (e.g., alpha spending or group-sequential bounds). 5) If precision is business-critical, explain how you’d estimate a CI for precision at a fixed recall using the delta method or bootstrap, and when each is appropriate.

Quick Answer: This question evaluates a data scientist's competency in statistical experimental design and inference, covering sample size calculation for proportions, two-sided hypothesis testing and confidence intervals, issues from drift and seasonality, control of Type I error under peeking, and precision estimation methods like the delta method and bootstrap. It is commonly asked to assess understanding of A/B test power and error control in the Statistics & Math domain and tests both conceptual understanding and practical application of statistical methods.

Related Interview Questions

  • Compute an A/B test p-value by hand - Amazon (medium)
  • Compute and interpret quantile loss vs RMSE - Amazon (medium)
  • Compute CIs, power, and multiple testing - Amazon (medium)
  • Plan and analyze an A/B test - Amazon (hard)
  • Compute p-values, CIs, and adjust multiples - Amazon (Medium)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Statistics & Math
2
0

A/B Test on Spam Rate: Sample Size, Inference, and Practical Pitfalls

Context: You are evaluating a new classifier that aims to reduce the spam rate (proportion of emails incorrectly flagged as spam) from 2.0% to 1.8% — a 10% relative reduction. You will run a two-arm randomized bucket test (equal allocation) and use a two-sided z-test for proportions.

Tasks:

  1. Sample size: Derive the per-arm sample size needed for α = 0.05 and power = 0.80 to detect a drop from 2.0% to 1.8%. Show the formula and clearly state your variance assumptions (pooled vs. unpooled).
  2. Inference on observed data: If you observe control = 2.1% on 500,000 emails and treatment = 1.85% on 500,000 emails, compute the 95% confidence interval (CI) for the absolute difference (treatment − control) and the corresponding two-sided p-value.
  3. Validity under drift/seasonality: Discuss the impact of class-imbalance drift and traffic seasonality on the test’s validity. Propose stratification/blocking to mitigate these issues.
  4. Peeking and early stopping: If you peek daily and stop when p < 0.05, explain why Type I error inflates and outline a correction (e.g., alpha spending or group-sequential bounds).
  5. Precision at fixed recall: If precision is business-critical, explain how to estimate a CI for precision at a fixed recall using the delta method or bootstrap, and when each is appropriate.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Statistics & Math•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Statistics & Math•Data Scientist Statistics & Math
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.