You are launching a new video-ad format. Design an end-to-end A/B test to evaluate it against the current ad format. Be precise:
-
Define exposure and eligibility (e.g., viewability ≥2s with ≥50% pixels in-view; exclude autoplay-muted?). Choose unit of randomization (user vs. session vs. geo) and justify interference risk.
-
Choose a single primary metric (e.g., incremental conversions, brand recall lift via survey, or on-site engaged sessions) and guardrails (latency, complaint rate, bounce). Specify attribution windows and any view-through rules.
-
Power/MDE: state baseline, expected effect, variance, sample size, duration; include ramp plan and traffic allocation. Note clustering/correlation if randomizing by geo or user.
-
Analysis plan: pre-register; intent-to-treat vs. exposure-adjusted; handle noncompliance and partial exposure; specify statistical test, variance estimator, and multiple-testing control for secondaries.
-
Operational risks: novelty/learning effects, day-of-week seasonality, creative heterogeneity, frequency caps, cooldown, and lagged effects; define cooldown/observation windows.
-
Decision rule and stop/extend criteria; monitoring approach (e.g., group-sequential or Bayesian with spending functions) to avoid peeking bias.
Follow-up: If the primary metric shows no improvement, detail the next steps in order: (a) verify exposure/compliance and logging; (b) reassess power/MDE and under-randomization; (c) check metric alignment and upstream funnel movement (e.g., view-through rate, CTR, dwell); (d) run pre-registered heterogeneity checks with multiple-testing correction; (e) assess lagged effects via extended observation; (f) test dosage/frequency or creative variants in a new pre-registered experiment; (g) evaluate whether a smaller but precise effect is still ROI-positive; (h) decide to pivot/iterate or deprecate, based on the pre-specified decision rule.