How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

What difficulty level is this interview question?

This is a medium difficulty Analytics & Experimentation question, commonly asked during Technical Screen rounds at Roblox.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Roblox during technical interviews.

How to estimate a feature’s causal impact on time spent

Quick Overview

Evaluates causal inference and time-series experimentation skills—specifically handling confounding, staggered rollouts, seasonality, and credible treatment-effect identification—in the Analytics & Experimentation domain at a mid-to-senior data scientist abstraction level.

You work on a Roblox-like game platform. A new product change ("feature") is rolled out and you want to estimate its causal impact on user engagement, measured as daily time spent (minutes per user-day).

However, the rollout is not fully randomized:

The feature was enabled first for some platforms/regions and later for others.
More engaged users may be more likely to receive/enable the feature earlier.
Time spent has strong seasonality (day-of-week) and a general upward/downward trend.

You have event-level logs aggregated to a user-day table:

user_id
date (in UTC)
minutes_spent
feature_enabled (1 if the feature is enabled for that user on that date)
User attributes (e.g., country , platform , account_age_days )

Task:

Describe how you would determine whether the feature affects minutes_spent , including how you would handle confounding .
If you choose Difference-in-Differences (DiD) , specify:
- What are the treatment/control groups and pre/post windows?
- The key identifying assumption(s) and how you would check them.
- A regression specification you would run and what coefficient answers the question.
List common failure modes (e.g., violated assumptions, interference) and at least two alternative approaches if DiD is not credible.

State any additional assumptions you need and what outputs (tables/plots) you would show to stakeholders.

Quick Overview

However, the rollout is not fully randomized:

The feature was enabled first for some platforms/regions and later for others.
More engaged users may be more likely to receive/enable the feature earlier.
Time spent has strong seasonality (day-of-week) and a general upward/downward trend.

You have event-level logs aggregated to a user-day table:

user_id
date (in UTC)
minutes_spent
feature_enabled (1 if the feature is enabled for that user on that date)
User attributes (e.g., country , platform , account_age_days )

Task:

Describe how you would determine whether the feature affects minutes_spent , including how you would handle confounding .
If you choose Difference-in-Differences (DiD) , specify:
- What are the treatment/control groups and pre/post windows?
- The key identifying assumption(s) and how you would check them.
- A regression specification you would run and what coefficient answers the question.
List common failure modes (e.g., violated assumptions, interference) and at least two alternative approaches if DiD is not credible.

State any additional assumptions you need and what outputs (tables/plots) you would show to stakeholders.

How to estimate a feature’s causal impact on time spent

Quick Overview

How to estimate a feature’s causal impact on time spent

Write your answer

How to estimate a feature’s causal impact on time spent

Quick Overview

How to estimate a feature’s causal impact on time spent

Write your answer