Experiment Metric Design, Guardrails, and Power for a 14-Day A/B Test
Context
You are testing a newly launched, guest-facing booking feature in a global, two-sided travel marketplace. Randomization occurs at the user level. Users become "exposed" when they first encounter an eligible surface during a 14-day enrollment window. You want a primary metric that maps to long-term value and a set of guardrails to protect user experience and platform health.
Tasks
-
Define a single primary metric that best maps to long-term value. For this metric, specify:
-
Exact formula, units, numerator/denominator definitions
-
Event deduplication rules
-
Windowing (e.g., enrollment window vs. attribution window)
-
Handling of outliers (e.g., winsorization) and bots/fraud filters
-
Currency normalization and timezone alignment
-
Choice and justification of ratio-of-sums vs. sum-of-ratios
-
Sensitivity to late/backfilled events
-
Define at least three guardrail metrics with:
-
Exact formulas, windows, units, dedup rules (if applicable)
-
Outlier handling and bot filtering
-
Describe how you would detect and correct a silent logging change mid-experiment.
-
Compute statistical power and MDE for a 14-day test given historical variance; show the formula and a worked example with reasonable assumptions.
-
Explain a sequential monitoring strategy and alpha spending approach for interim looks.
-
Propose canary thresholds for guardrails (e.g., crash rate, p95 latency, complaint rate) and describe what you would do if the primary improves but a guardrail slightly regresses.