This question evaluates understanding of Simpson's paradox, confounding, and causal reasoning, checking numerical interpretation of subgroup versus aggregate statistics and the ability to recognize when aggregated averages can mislead.
Define Simpson’s paradox and construct a concrete numeric example where group-wise success rates favor treatment in each subgroup but the aggregate rate favors control due to imbalanced group sizes. Provide the algebra showing how a weighted average reverses the trend, relate this to confounding and causal graphs, and describe practical remedies (stratification, standardization, regression with appropriate adjustment). Explain when the aggregate result is the right estimand and when conditioning is required.