Design evaluation when A/B test is impossible
Company: Microsoft
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: easy
Interview Round: Technical Screen
## Scenario
You want to evaluate whether a product or model change (e.g., a new ranking strategy, pricing rule, or UI change) improves business outcomes.
However, **you cannot run a standard randomized A/B test** due to one or more constraints:
- Legal/compliance restrictions (cannot randomize users)
- Platform limitation (no experimentation framework)
- Strong network effects / interference (user outcomes affect each other)
- Rollout must be global (no holdout allowed)
- Treatment is self-selected (users opt in)
## Questions
1. **What metrics** would you choose?
- Propose a **primary metric** and at least 2 **diagnostic** and 2 **guardrail** metrics.
- Explain tradeoffs (e.g., short-term vs long-term, sensitivity vs robustness).
2. **How would you estimate the counterfactual** (what would have happened without the change)?
- Propose multiple causal inference approaches (at least 3), e.g. matching/weighting, difference-in-differences, synthetic control, regression discontinuity, instrumental variables, uplift modeling, etc.
- For each approach, state key **assumptions**, what data you need, and how you would validate/pressure-test the assumptions.
3. **How would you handle common pitfalls**?
- Confounding / selection bias
- Seasonality and time trends
- Delayed effects / novelty effects
- Spillovers/interference
- Missing data and metric instrumentation changes
4. Product monitoring follow-up:
- Suppose after launch, the **core KPI drops sharply on a single day**. Outline a structured investigation plan to determine whether it’s (a) a real product issue, (b) a logging/pipeline issue, or (c) an external shock.
- Include what slices you would check first and what “sanity checks” you would run.
Quick Answer: This question evaluates a data scientist's competency in causal inference, observational experiment design, metric selection, and production monitoring within the Analytics & Experimentation domain.