Experiment Design: Raising a 7‑Day Frequency Cap from 3→4 Impressions
Context
A large video ad campaign plans to raise the per‑user rolling 7‑day frequency cap from 3 to 4 impressions. The goal is to estimate the causal impact on conversions while accounting for clustering and potential interference (auctions, cross‑device households, overlapping advertisers, pacing).
-
Population: US users eligible for the campaign; ~4,000,000 eligible users per day during test.
-
Randomization candidates: user_id, household_id, or geo cell.
-
Average household size among eligible users: m = 1.3.
-
Household-level ICC for the primary metric: 0.02.
-
Primary metric: 7‑day conversion rate per unique exposed user (any purchase within 7 days of first exposure), baseline p0 = 2.00%.
-
Guardrails: daily unique reach, average session watch time, complaint rate per 1,000 impressions.
-
Traffic allocation: 50% Treatment (cap = 4), 50% Control (cap = 3), duration 28 days, with 4 equally spaced interim looks (including final).
-
CUPED: pre‑period 7‑day metric available with R^2 = 0.35.
-
Interference risks: shared auctions across campaigns, overlapping advertisers, cross‑device households, pacing controls.
Tasks
-
Choose the randomization unit and justify it with a causal diagram. Specify where interference could occur and how your choice mitigates it. Propose cross‑campaign holdouts or ghost‑bids if needed.
-
Define precise metric formulas (numerators/denominators, exposure semantics, attribution window, de‑duplication across devices) and the data to log to compute them unambiguously.
-
Compute the minimum per‑arm sample size (unique users) to detect an absolute lift from 2.00% to 2.10% (Δ = +0.10 pp) with α = 0.05 (two‑sided) and 1−β = 0.80 using a two‑proportion z‑test. Adjust for clustering via VIF = 1 + (m−1)·ICC, then adjust variance for CUPED by multiplying by (1−R^2). Show the final effective sample size and discuss whether 28 days of traffic suffices.
-
Specify sequential monitoring using O’Brien–Fleming boundaries for 4 looks: give approximate nominal α at each look and describe the decision rules.
-
List at least three diagnostic checks (e.g., covariate balance on pre‑period exposures, saturation by user quantile, auction pressure) and the exact plots you would produce. Explain how you would interpret each to decide whether to ship the higher cap.