How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

What difficulty level is this interview question?

This is a hard difficulty Analytics & Experimentation question, commonly asked during Take-home Project rounds at Roblox.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Roblox during technical interviews.

Compute DiD and validate parallel trends | Roblox Interview Question

Q: Compute DiD and validate parallel trends

This question evaluates a candidate's skill in causal inference and data manipulation, specifically computing a difference-in-differences estimate from panel data and testing parallel trends via pre-period treatment-control mean comparisons.

You are given observational/experiment-like panel data as three equal-length arrays:

period[i] : an integer time period label for observation i (e.g., -2, -1 are pre; 0, 1 are post).
group[i] : 1 if the unit is in the treatment group, 0 if in control.
outcome[i] : numeric outcome.

Tasks:

Difference-in-differences (DiD) : compute the DiD estimate using:
- pre period = the latest pre period ( max{period < 0} )
- post period = the earliest post period ( min{period >= 0} )
Define:
- $\bar{Y}_{T,post}$ , $\bar{Y}_{T,pre}$ , $\bar{Y}_{C,post}$ , $\bar{Y}_{C,pre}$ as the mean outcomes for Treatment/Control at the chosen post/pre periods.
- $\text{DiD} = (\bar{Y}_{T,post}-\bar{Y}_{T,pre}) - (\bar{Y}_{C,post}-\bar{Y}_{C,pre})$ .
Parallel trends / pre-trend validation : given a numeric threshold , validate that treatment and control follow similar trends in the pre period by checking:
- For each pre period $t$ , compute $d_t = \bar{Y}_{T,t} - \bar{Y}_{C,t}$ .
- The pre-trend is considered valid if max(d_t) - min(d_t) <= threshold across all pre periods.

Output:

Return the DiD estimate (float) and a boolean indicating whether the pre-trend validation passes.

Assumptions:

There is at least one pre period and one post period.
Means are computed over all rows matching the period/group filters.

You are given observational/experiment-like panel data as three equal-length arrays:

period[i] : an integer time period label for observation i (e.g., -2, -1 are pre; 0, 1 are post).
group[i] : 1 if the unit is in the treatment group, 0 if in control.
outcome[i] : numeric outcome.

Tasks:

Difference-in-differences (DiD) : compute the DiD estimate using:
- pre period = the latest pre period ( max{period < 0} )
- post period = the earliest post period ( min{period >= 0} )
Define:
- $\bar{Y}_{T,post}$ , $\bar{Y}_{T,pre}$ , $\bar{Y}_{C,post}$ , $\bar{Y}_{C,pre}$ as the mean outcomes for Treatment/Control at the chosen post/pre periods.
- $\text{DiD} = (\bar{Y}_{T,post}-\bar{Y}_{T,pre}) - (\bar{Y}_{C,post}-\bar{Y}_{C,pre})$ .
Parallel trends / pre-trend validation : given a numeric threshold , validate that treatment and control follow similar trends in the pre period by checking:
- For each pre period $t$ , compute $d_t = \bar{Y}_{T,t} - \bar{Y}_{C,t}$ .
- The pre-trend is considered valid if max(d_t) - min(d_t) <= threshold across all pre periods.

Output:

Return the DiD estimate (float) and a boolean indicating whether the pre-trend validation passes.

Assumptions:

There is at least one pre period and one post period.
Means are computed over all rows matching the period/group filters.

Compute DiD and validate parallel trends

Quick Overview

Compute DiD and validate parallel trends

Write your answer

Compute DiD and validate parallel trends

Quick Overview

Compute DiD and validate parallel trends

Write your answer