A/B Test Design: Co‑Branded Gym Credit Card Offer
Context: You will A/B test a 3‑month free gym membership offer shown on the application landing page (Variant B) against no offer (Control A). You will run a 14‑day 50/50 experiment. Traffic is ~200,000 sessions/day. Use the inputs below and design a rigorous, risk‑aware experiment and analysis.
Assumptions you may explicitly make when needed:
-
"Approval rate" refers to approvals per visitor/session unless otherwise specified.
-
Revenue/loss inputs are 90‑day averages unless you scale them.
-
The offer is paid by the partner (cost = $0) unless you introduce a placeholder term.
Inputs:
-
Baseline approvals per visitor: 8% (apply→approval rate = 8%)
-
Average initial credit line (CL): $1,000
-
90‑day charge‑off rate (PD_90): 1.2%
-
Loss severity (LGD): 60%
-
Average 90‑day revenue per approved account: $120 (interest + interchange)
-
Acquisition bonus + onboarding costs per approved: $40
-
Risk constraint: Predicted default probability (PD) of the approved pool must not increase by >10% relative to control.
-
Test: 14 days, 50/50 split, ~200k sessions/day total.
Tasks:
-
Define a single primary metric as risk‑adjusted profit per visitor (RAPV). Write an exact formula using the inputs above. Specify at least two guardrail metrics (with thresholds) covering risk/fraud/compliance.
-
Convert a 5% relative lift hypothesis on approval rate into an expected lift on RAPV. State assumptions needed to avoid Simpson’s paradox across traffic sources and day‑of‑week.
-
Compute or set up the required sample size per variant for 80% power at α = 0.05 to detect the expected RAPV lift. Justify variance estimation (delta method vs bootstrap) and whether you’ll use CUPED or pre‑period covariates.
-
Specify randomization unit and identity resolution to prevent contamination (cookies, logged‑in IDs, device graph) and how you’ll treat repeat applicants, bots, and duplicate identities.
-
Detail your sequential testing plan (e.g., group‑sequential or alpha spending) to allow interim safety stops without inflating Type I error; define exact stop/go/ramp criteria.
-
Show how you will monitor risk mix shift (e.g., PD by score bands) and enforce the 10% PD guardrail while avoiding conditioning on post‑treatment variables; propose a stratified analysis and a heterogeneity readout by acquisition channel and geography.
-
Outline data quality checks (event schema, missingness, timeouts), instrumentation events, and backfill/late arrival handling.
-
If Variant B increases approvals by 6% but raises PD by 9% and reduces average credit line by 5%, decide whether to ship, using a 12‑month NPV sensitivity (state discount rate and churn assumptions).
-
List two follow‑on experiments to isolate mechanism (e.g., offer placement vs wording), and one off‑policy evaluation you’d run using historical scores.