How would you measure causal impact?
Company: Upstart
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: medium
Interview Round: Technical Screen
Answer the following two analytics interview prompts.
1. **Causal impact without an experiment**
Describe a real or hypothetical product, model, or policy change where the business wants to measure impact, but a randomized experiment cannot be launched because of operational, legal, ethical, network-effect, or rollout constraints. Explain:
- the treatment, unit of analysis, target population, and primary success metric(s)
- why an experiment is infeasible
- which causal inference approach you would use (for example: difference-in-differences, synthetic control, matching, inverse propensity weighting, doubly robust estimation, interrupted time series, instrumental variables, regression discontinuity, or an ML-based counterfactual model)
- the assumptions required for identification
- potential sources of bias or confounding
- how you would validate the method and quantify uncertainty
- how you would separate short-term impact from long-term impact
- why you chose this approach instead of other seemingly simpler methods
2. **Three-variant experiment and forecasting future conversion**
You run a 3-arm experiment to maximize **CTP (purchase rate = purchases / visits)**. The observed results are:
- Variant A: 150 visits, 43 purchases
- Variant B: 200 visits, 48 purchases
- Variant C: 100 visits, 15 purchases
Answer the following:
- Which variant is currently winning?
- Show a reasonable by-hand statistical analysis using confidence intervals or hypothesis tests.
- How would your recommendation change if additional metrics also matter, such as revenue per visitor, average order value, refund rate, retention, or latency?
- If one variant is launched, how would you predict its future CTP in production, accounting for uncertainty and possible traffic or seasonality shifts?
Quick Answer: This question evaluates a data scientist's competency in causal inference, experimental design, and statistical analysis, covering techniques for estimating treatment effects without randomized experiments and for comparing outcomes across multiple experimental variants.