Design A/B Test to Measure PayPal Cashback Value
Scenario
PayPal plans to offer a targeted cashback incentive for purchases at Walmart. You need to design an A/B test that convincingly demonstrates the value this cashback creates for Walmart (not just for PayPal), while respecting practical constraints of time and data.
Task
Structure an end-to-end experiment to measure the value for Walmart, including:
-
Experiment design
-
Unit of randomization, eligibility, treatment/control definition, stratification, exposure/compliance, instrumentation, and interference controls.
-
Metric taxonomy
-
Define primary, secondary, and guard-rail metrics that reflect value to Walmart.
-
Power analysis and sample-size math
-
Show how you would set the minimum detectable effect (MDE), estimate variance, compute sample size, and adjust for clustering or variance reduction.
-
Statistical testing plan
-
Explain p-value interpretation, multiple testing control, and whether/how you’d use sequential testing.
-
If results miss the MDE
-
Outline steps to salvage learning and decision-making.
-
If power is insufficient and the timeline cannot be extended
-
Propose alternatives such as variance reduction (e.g., CUPED/CUBED), pre-post designs, or geo/synthetic-control approaches that allow a credible read within the fixed time.
Hints
-
Include metric definitions and formulas.
-
Show example numbers for sample size calculations.
-
Discuss sequential testing and variance reduction (e.g., CUPED/CUBED) and mitigation plans for data/operational risks.
Constraints & Assumptions
-
Preserve the scope, facts, inputs, and requested outputs from the prompt above.
-
If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
-
Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.
Clarifying Questions to Ask
-
Clarify the business objective, unit of analysis, time window, exposure definition, and primary metric.
-
State assumptions about instrumentation, randomization, sample size, and data quality.
-
Separate descriptive analysis from causal claims.
What a Strong Answer Covers
-
A metric framework with primary, guardrail, and diagnostic metrics.
-
A credible analysis or experiment design with clear assumptions and bias checks.
-
SQL/statistical logic for segmentation, variance, confidence, and data validation where relevant.
-
An actionable recommendation that explains trade-offs and next steps.
Follow-up Questions
-
What sanity checks would you run before trusting the result?
-
How would you handle novelty effects, seasonality, or selection bias?
-
What decision would you make if metrics disagree?