Multi‑Outcome Experiment: Decision Framework, Multiplicity, Pre‑registration, and Communication
Context
You ran an A/B test with:
-
Primary metric: conversion
-
Secondary metric: ARPU
-
Guardrails: p95 latency and crash rate
Observed effects (two‑sided p‑values):
-
Conversion: +1.1 percentage points, p = 0.04
-
ARPU: −3%, p = 0.08
-
p95 Latency: +2%, p = 0.20
-
Crash rate: +0.15 percentage points, p = 0.03
Assume independent randomization, standard frequentist testing, and that guardrails are safety (worsening is bad).
Tasks
(a) Propose a principled decision framework: specify hypothesis hierarchy, whether/how to gate on guardrails, and acceptable Type I error rates.
(b) Choose and justify a multiple‑testing control (e.g., Holm, Hochberg, Benjamini–Hochberg for FDR). Apply it to the given results and note which conclusions change.
(c) Explain how to pre‑register metrics, families, and stopping rules to avoid p‑hacking.
(d) Draft a ≤120‑word message to the PM with the decision and trade‑offs in plain language.
(e) If you must ship despite a guardrail hit, propose a mitigation and follow‑up plan.