Design a flu-shot A/B/n campaign experiment
Company: CVS Health
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Technical Screen
You are the analytics owner for a Fall 2025 pharmacy campaign to increase in-store flu vaccinations using SMS and Email. Design and evaluate the experiment end-to-end.
Context
- Target customers: adults with pharmacy loyalty IDs, across CA/NY/TX. Outcome = received flu shot at your pharmacies within Sept–Nov 2025.
- Channels: SMS, Email; both have per-send costs (SMS=$0.02, Email=$0.001). Deliverability varies (SMS 92%, Email 98%).
Tasks
1) Experimental design
- Choose between A/B (one channel vs control) or 2x2 factorial (SMS on/off x Email on/off). Justify considering potential interaction, interference, and send-cost constraints.
- Define eligibility, exclusion (e.g., opt-outs, prior vaccination), randomization unit (person vs household), and stratification variables (age band, state, past visits).
- Specify primary metric (absolute lift in vaccination rate) and guardrail metrics (opt-outs, complaints, no-show appointments, capacity).
2) Sample size and power (show formulas and numeric results)
- Baseline vaccination rate assumed 8.0%; minimum detectable effect (MDE) 1.5 percentage points; two-sided alpha=0.05, power=0.80. Compute per-arm sample size ignoring clustering, then discuss inflation for household clustering with ICC=0.01 and average household size=1.3.
3) Compliance and analysis
- Only a fraction of treated are actually exposed (deliverability + opens). Define Intention-To-Treat (ITT) estimator and compute Treatment-On-The-Treated (TOT) using deliverability as an instrument. Show the relationship TOT ≈ ITT / compliance, and state assumptions.
4) Attribution and measurement
- Customers can receive both channels in a factorial design. Propose instrumentation (unique links/codes, message timestamps) and analysis to attribute incremental impact to each channel (e.g., factorial contrasts, hierarchical models). Explain why self-reported "came because of SMS/Email" is biased and how to use it, if at all.
5) Edge case
- In one market, 100 vaccinated customers from the SMS arm, but only 50 report they came because of a message. With control vaccination rate = 7.5% and SMS-arm rate = 9.0%, compute ITT lift and discuss why self-reports do not change the causal estimate. What operational changes would you test next if the lift is below the 1.5pp target?
6) Reporting
- Define how you would monitor during-rollout (sequential testing controls), finalize results (confidence intervals, CUPED if pre-period data exist), and recommend a scaled policy under a fixed budget.
Quick Answer: This question evaluates a data scientist's competency in experimental design, causal inference, sample-size and power calculations, compliance-adjusted effect estimation, attribution instrumentation, and reporting for multi-channel marketing experiments.