Design Effective A/B Tests for Onboarding
A consumer subscription app is launching a redesigned onboarding flow for newly registered users. The goal is to increase activation, defined for this prompt as starting to play any title within 7 days of signup unless your organization uses a different definition.
Constraints & Assumptions
-
Users should be randomized at signup and assigned persistently.
-
Define exposure, eligibility, and activation precisely.
-
Include primary metrics, guardrails, power, runtime, monitoring, and decision rules.
-
Handle early mixed results carefully.
Clarifying Questions to Ask
-
What is the current baseline activation rate?
-
What onboarding steps changed?
-
Are users exposed across multiple devices?
-
What support, retention, or subscription outcomes could be affected?
Part 1 - Experiment Setup
State the hypothesis, unit of randomization, and treatment/control definitions.
What This Part Should Cover
-
New-user eligibility, sticky user-level randomization, exposure definition, treatment and control flows, and hypothesis.
Part 2 - Metrics
What primary, secondary, and guardrail metrics would you use?
What This Part Should Cover
-
Activation within 7 days, onboarding completion, first play, subscription conversion, retention, and engagement.
-
Guardrails such as support tickets, cancellations, latency, crashes, user complaints, and long-term retention.
Part 3 - Power and Monitoring
How would you calculate sample size/runtime and monitor the test?
What This Part Should Cover
-
Baseline rate, MDE, alpha, power, traffic volume, experiment duration, sequential monitoring rules, SRM, and instrumentation checks.
Part 4 - Decision Framework
What would you do if early results show activation uplift but support tickets increase?
What This Part Should Cover
-
Predefined guardrail thresholds, severity analysis, segment diagnostics, practical significance, iteration, holdout, or ramp decision.
What a Strong Answer Covers
A strong answer defines the activation metric and exposure cleanly, designs a powered user-level experiment, monitors guardrails, and makes decisions based on both user value and operational risk.
Follow-up Questions
-
What if activation rises but 30-day retention falls?
-
How would you avoid peeking bias?
-
How would you analyze treatment effects by user segment?