Compute DiD and validate parallel trends
Company: Roblox
Role: Data Scientist
Category: Analytics & Experimentation
Difficulty: hard
Interview Round: Take-home Project
You are given observational/experiment-like panel data as three equal-length arrays:
- `period[i]`: an integer time period label for observation `i` (e.g., -2, -1 are pre; 0, 1 are post).
- `group[i]`: 1 if the unit is in the treatment group, 0 if in control.
- `outcome[i]`: numeric outcome.
Tasks:
1) **Difference-in-differences (DiD)**: compute the DiD estimate using:
- pre period = the **latest** pre period (`max{period < 0}`)
- post period = the **earliest** post period (`min{period >= 0}`)
Define:
- \(\bar{Y}_{T,post}\), \(\bar{Y}_{T,pre}\), \(\bar{Y}_{C,post}\), \(\bar{Y}_{C,pre}\) as the mean outcomes for Treatment/Control at the chosen post/pre periods.
- \(\text{DiD} = (\bar{Y}_{T,post}-\bar{Y}_{T,pre}) - (\bar{Y}_{C,post}-\bar{Y}_{C,pre})\).
2) **Parallel trends / pre-trend validation**: given a numeric `threshold`, validate that treatment and control follow similar trends in the pre period by checking:
- For each pre period \(t\), compute \(d_t = \bar{Y}_{T,t} - \bar{Y}_{C,t}\).
- The pre-trend is considered valid if `max(d_t) - min(d_t) <= threshold` across all pre periods.
Output:
- Return the DiD estimate (float) and a boolean indicating whether the pre-trend validation passes.
Assumptions:
- There is at least one pre period and one post period.
- Means are computed over all rows matching the period/group filters.
Quick Answer: This question evaluates a candidate's skill in causal inference and data manipulation, specifically computing a difference-in-differences estimate from panel data and testing parallel trends via pre-period treatment-control mean comparisons.