Experiment Design: Compare Two Ranking Models (M1 vs M0) for $5 Promotions
Context
You have two models, M0 (current) and M1 (new), that rank users for a daily $5 promotion. The goal is to run a controlled experiment to decide whether M1 produces higher business value than M0 while keeping spend and contact frequency comparable across arms.
Assume:
-
Users can be eligible on a given day if they meet business rules and are not recently contacted (cool-down enforced).
-
Promotions are sent daily to a limited subset (top-K) within each arm based on that arm’s model ranking.
-
A $5 promo cost is incurred only upon redemption; the business value of a redemption is proportional to GMV (or margin).
Task
Design a controlled experiment that meets the following requirements:
-
Randomization and Allocation
-
Randomize users into two arms (M0 and M1).
-
Within each arm, rank eligible users by that arm’s model and send to the arm-specific top-K per day so both arms have equal expected spend and contact frequency.
-
Primary Metric and Guardrails
-
Define the primary metric as incremental profit per eligible user: (redeem uplift × expected GMV − $5), with guardrails on opt-outs, uninstalls, and support contacts.
-
Triggering and Logging
-
Use triggered analysis: include user-days that are eligible and not recently contacted.
-
Use intent-to-treat (ITT) as the estimand.
-
Log exposure, assignment, eligibility, and redemption events.
-
Power and Sample Size
-
Compute required sample size given: baseline redemption = 3%, MDE = +0.5 percentage points (absolute), α = 0.05 two-sided, power = 0.8.
-
Show the formula, include variance inflation for day-level clustering, and plan for delayed outcomes.
-
Analysis Plan
-
Include SRM checks, CUPED using pre-period outcomes, and sequential monitoring with an alpha-spending plan.
-
Report point estimates and confidence intervals with robust (clustered) variance.
-
Interference and Saturation
-
Address user-to-user interference and channel saturation limits; if present, propose cluster randomization or switchback and explain trade-offs.
-
Budget Control and Operational Details
-
Keep budgets equal despite drift (e.g., per-arm thresholding and rebalancing), handle throttling and suppression lists, and ensure no spillover between arms.