Design Testing Without A/B Experiments
Company: Microsoft
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: medium
Interview Round: Technical Screen
Suppose a product team wants to evaluate a new feature that is intended to improve user engagement and long-term retention, but a clean randomized A/B test is not feasible because of legal, engineering, or rollout constraints.
How would you evaluate whether the feature actually helps the business?
Please address all of the following:
1. Define the business goal, unit of analysis, treatment, and primary success metric.
2. Propose primary and guardrail metrics, for example: click-through rate, session depth, 7-day retention, latency, complaint rate, and revenue per active user. Explain the trade-offs among them.
3. If randomization is impossible, compare several counterfactual or causal inference approaches such as difference-in-differences, synthetic control, matching, propensity scores, inverse probability weighting, regression adjustment, doubly robust estimation, instrumental variables, and regression discontinuity.
4. For each method, explain the key assumptions, likely sources of bias, and how you would validate the assumptions in practice.
5. Explain how your approach would change for: an opt-in feature, a one-sided rollout, a staggered launch across regions, or a policy change affecting all users at once.
6. If partial randomization is possible, explain whether you would prefer a switchback, geo experiment, or phased rollout, and how power, MDE, and variance reduction methods such as CUPED would matter.
7. Finally, suppose the core KPI suddenly drops on one specific day after launch. Walk through how you would determine whether the drop is caused by the feature, instrumentation issues, traffic-mix changes, outages, seasonality, or some other external factor.
Quick Answer: This question evaluates a data scientist's mastery of causal inference, observational study methods, metric definition, experimental design trade-offs, and failure-mode analysis when randomized A/B testing is infeasible.