A marketing team tests a new email campaign.
They run an experiment for two weeks in two cities (SF and NY) comparing Email A vs Email B.
They observe:
-
In
each city (and/or each week)
,
B has a higher conversion rate than A
.
-
But when they
combine all data
,
A has a higher overall conversion rate than B
.
Questions
-
Explain how this can happen (Simpson’s paradox) and list the minimum conditions needed.
-
How would you determine whether
B is truly better than A
?
-
What metrics would you use (primary, diagnostic, guardrails), and what confounders would you worry about (e.g., city baseline differences, time-of-day/timezone effects, imbalance in allocation)?
-
Can you compute a confidence interval (CI) for the treatment effect? If yes, how (conceptually and/or with formulas)?
-
If the dataset is imbalanced across cities/weeks, what would you recommend operationally (reweighting, stratified analysis, rerun, blocking)?