Scenario
You want to evaluate whether a product or model change (e.g., a new ranking strategy, pricing rule, or UI change) improves business outcomes.
However, you cannot run a standard randomized A/B test due to one or more constraints:
-
Legal/compliance restrictions (cannot randomize users)
-
Platform limitation (no experimentation framework)
-
Strong network effects / interference (user outcomes affect each other)
-
Rollout must be global (no holdout allowed)
-
Treatment is self-selected (users opt in)
Questions
-
What metrics
would you choose?
-
Propose a
primary metric
and at least 2
diagnostic
and 2
guardrail
metrics.
-
Explain tradeoffs (e.g., short-term vs long-term, sensitivity vs robustness).
-
How would you estimate the counterfactual
(what would have happened without the change)?
-
Propose multiple causal inference approaches (at least 3), e.g. matching/weighting, difference-in-differences, synthetic control, regression discontinuity, instrumental variables, uplift modeling, etc.
-
For each approach, state key
assumptions
, what data you need, and how you would validate/pressure-test the assumptions.
-
How would you handle common pitfalls
?
-
Confounding / selection bias
-
Seasonality and time trends
-
Delayed effects / novelty effects
-
Spillovers/interference
-
Missing data and metric instrumentation changes
-
Product monitoring follow-up:
-
Suppose after launch, the
core KPI drops sharply on a single day
. Outline a structured investigation plan to determine whether it’s (a) a real product issue, (b) a logging/pipeline issue, or (c) an external shock.
-
Include what slices you would check first and what “sanity checks” you would run.