PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Analytics & Experimentation/Uber

Design a robust email A/B test

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a data scientist's competency in experimental design, including randomization and stratification, statistical power and sample-size estimation, primary and guardrail metric definition, sequential monitoring, and data-quality diagnostics for large-scale email A/B tests.

  • hard
  • Uber
  • Analytics & Experimentation
  • Data Scientist

Design a robust email A/B test

Company: Uber

Role: Data Scientist

Category: Analytics & Experimentation

Difficulty: hard

Interview Round: Technical Screen

You own a weekly email campaign to 10M users. Baseline CTR is 3.0% and unsubscribes are 0.08% per send. Marketing proposes a new subject line expected to increase CTR by +6% relative (MDE ≈ +0.18pp absolute). Design the experiment end-to-end: 1) Randomization: What is the randomization unit and any stratification blocks you would use (e.g., locale, device, engagement tier)? How do you prevent contamination from resends and cross-campaign overlap in the same week? 2) Power: Compute per-arm sample size for α=0.05 (two-sided) and 80% power for detecting +0.18pp absolute lift on CTR from a 3.0% baseline, assuming independent Bernoulli outcomes at the user-send level. State your assumptions and show the formula you would use. 3) Metrics: Choose a single primary success metric and at least two guardrail metrics (e.g., unsubscribe rate, spam complaints). Define each precisely (numerator/denominator, window), and justify the choice over alternatives like open rate. 4) Sequential monitoring: Leadership wants daily peeks and the ability to stop early for harm. Propose a valid plan (e.g., alpha-spending or group-sequential boundaries) that controls the Type I error. Specify the monitoring schedule and stopping/continuation rules. 5) Mid-experiment checks: What diagnostics would you run after 48 hours to detect randomization failure, instrumentation delays, or traffic mix shifts (e.g., weekend effects)? How would you correct issues without biasing estimates? 6) Results handling: If the interim shows negative CTR lift but higher opens, enumerate at least three plausible causes and the next decision (continue, stop-for-harm, or redesign). Explain how you would handle intention-to-treat vs per-protocol and what you would report to stakeholders.

Quick Answer: This question evaluates a data scientist's competency in experimental design, including randomization and stratification, statistical power and sample-size estimation, primary and guardrail metric definition, sequential monitoring, and data-quality diagnostics for large-scale email A/B tests.

Related Interview Questions

  • Design a Maps Address Search Bar - Uber
  • Evaluate a cold-start rating launch - Uber (medium)
  • Design Pricing Model Experiment - Uber (medium)
  • Evaluate marketplace interventions - Uber (medium)
  • Evaluate UberEATS priority delivery and membership - Uber (medium)
Uber logo
Uber
Oct 13, 2025, 9:49 PM
Data Scientist
Technical Screen
Analytics & Experimentation
6
0

A/B Test Design: New Email Subject Line for Weekly Campaign

You manage a weekly email campaign to 10 million users. Baseline unique click-through rate (CTR) is 3.0% and unsubscribe rate is 0.08% per send. Marketing proposes a new subject line expected to increase CTR by +6% relative (≈ +0.18 percentage points absolute lift, from 3.00% to 3.18%).

Assume a single weekly send, independent Bernoulli outcomes at the user–send level, and that users can receive resends to non-openers and may be eligible for other campaigns in the same week unless controlled.

Design the experiment end-to-end:

  1. Randomization
    • What is the randomization unit and what stratification blocks would you use (e.g., locale, device, engagement tier)?
    • How do you prevent contamination from resends and cross-campaign overlap in the same week?
  2. Power
    • Compute per-arm sample size for α = 0.05 (two-sided) and 80% power for detecting a +0.18 pp absolute lift on CTR from a 3.0% baseline.
    • State assumptions and show the formula you would use.
  3. Metrics
    • Choose a single primary success metric and at least two guardrail metrics (e.g., unsubscribe rate, spam complaints).
    • Define each precisely (numerator/denominator, measurement window) and justify the choice over alternatives like open rate.
  4. Sequential Monitoring
    • Leadership wants daily peeks and the ability to stop early for harm.
    • Propose a valid plan (e.g., alpha-spending or group-sequential boundaries) that controls Type I error. Specify the monitoring schedule and stopping/continuation rules.
  5. Mid-Experiment Checks (at ~48 hours)
    • What diagnostics would you run to detect randomization failure, instrumentation delays, or traffic mix shifts (e.g., weekend effects)?
    • How would you correct issues without biasing estimates?
  6. Results Handling
    • If an interim look shows negative CTR lift but higher opens, enumerate at least three plausible causes and the next decision (continue, stop-for-harm, or redesign).
    • Explain how you would handle intention-to-treat vs per-protocol and what you would report to stakeholders.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Analytics & Experimentation•More Uber•More Data Scientist•Uber Data Scientist•Uber Analytics & Experimentation•Data Scientist Analytics & Experimentation
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.