A payments company launches a new fraud-screening feature that adds an extra risk check before approving certain transactions. Leadership wants to know whether the feature reduces fraud losses without causing too many false declines, manual reviews, or customer complaints.
However, a simple user-level A/B test may not be straightforward because:
-
fraud labels arrive with delay,
-
transactions from the same merchant or customer are correlated,
-
attackers may adapt to treatment rules,
-
some rollout decisions may be constrained by operations or regulation.
How would you measure the business impact of the feature?
Your answer should discuss:
-
the causal question and recommended experimental or quasi-experimental design
-
the unit of randomization and possible interference effects
-
primary metrics and guardrail metrics
-
how delayed fraud labels change the analysis window
-
how to handle confounding from merchant mix, seasonality, or concurrent policy changes
-
how to estimate sample size, minimum detectable effect, and practical significance
-
how you would interpret trade-offs if fraud improves but approval rate worsens