Suppose a product team wants to evaluate a new feature that is intended to improve user engagement and long-term retention, but a clean randomized A/B test is not feasible because of legal, engineering, or rollout constraints.
How would you evaluate whether the feature actually helps the business?
Please address all of the following:
-
Define the business goal, unit of analysis, treatment, and primary success metric.
-
Propose primary and guardrail metrics, for example: click-through rate, session depth, 7-day retention, latency, complaint rate, and revenue per active user. Explain the trade-offs among them.
-
If randomization is impossible, compare several counterfactual or causal inference approaches such as difference-in-differences, synthetic control, matching, propensity scores, inverse probability weighting, regression adjustment, doubly robust estimation, instrumental variables, and regression discontinuity.
-
For each method, explain the key assumptions, likely sources of bias, and how you would validate the assumptions in practice.
-
Explain how your approach would change for: an opt-in feature, a one-sided rollout, a staggered launch across regions, or a policy change affecting all users at once.
-
If partial randomization is possible, explain whether you would prefer a switchback, geo experiment, or phased rollout, and how power, MDE, and variance reduction methods such as CUPED would matter.
-
Finally, suppose the core KPI suddenly drops on one specific day after launch. Walk through how you would determine whether the drop is caused by the feature, instrumentation issues, traffic-mix changes, outages, seasonality, or some other external factor.