A/B Test Plan: Redesigned User Signup Flow
Context and Data
You are analyzing an A/B experiment for a redesigned user signup flow. The dataset includes the following columns per user/session or per aggregation unit: variant (A/B), sessions, signups, activation_d7, p95_latency_ms, support_tickets, refund_rate, revenue_d30.
Assumptions to make explicit and verify:
-
Randomization unit: user_id (sticky assignment). If only session-level logs exist, cluster by user_id.
-
Definitions:
-
sessions: count of sessions exposed to the variant.
-
signups: count of completed signups.
-
activation_d7: count of users who activated within 7 days of signup (define “activation” precisely for your product).
-
p95_latency_ms: 95th percentile request latency during signup flow (computed from request-level logs).
-
support_tickets: count of tickets attributable to the signup/onboarding experience.
-
refund_rate: refunds per activated or paying user (clarify denominator; use consistent unit across variants).
-
revenue_d30: total revenue within 30 days from users exposed (define whether revenue is per user, per activated user, or all exposed; prefer per-user for inference).
Goal: Decide whether to ship the redesign using a principled testing plan with data quality checks, metric selection, estimation, power, temporal effects, and decision criteria.
Tasks
Describe exactly how you would:
-
Validate data quality (SRM test, bucketing integrity, exposure logs vs analytics counts).
-
Choose primary and guardrail metrics and justify them.
-
Compute effects with confidence intervals (including ratio metrics and non-parametric options if skewed), and apply variance reduction (e.g., CUPED) if appropriate.
-
Check power/min detectable effect and whether the observed duration met the pre-registered stopping rule.
-
Evaluate novelty and learning effects (time-sliced and cohort views).
-
Make the ship/no-ship call with a concrete decision framework that balances activation gains vs increased latency/support tickets.
-
List at least three additional insights to extract beyond the ship decision (e.g., segment heterogeneity by traffic source/device, step-drop analysis, form-field sensitivity).