PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCareers
|Home/Coding & Algorithms/Roblox

Implement four DS coding tasks

Last updated: Apr 9, 2026

Quick Overview

This multi-part question evaluates a data scientist's competencies in statistical inference (sample size and z-test calculations), causal inference and parallel-trends validation (difference-in-differences), Bayesian probability updating, and interpretable supervised learning feature importance, all framed as coding tasks.

  • easy
  • Roblox
  • Coding & Algorithms
  • Data Scientist

Implement four DS coding tasks

Company: Roblox

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Take-home Project

You are completing a CodeSignal-style assessment (Python or R). Implement solutions for the following four independent questions. ## 1) Two-sample z-test: required sample size You are given: - `x`: numeric array of historical observations for the metric (use it to estimate the metric standard deviation `sigma`) - `alpha`: significance level (e.g., 0.05) - `power`: desired power (e.g., 0.8) - `effect_size`: the minimum detectable absolute difference in means, \(\Delta\) Assumptions: - Two-sided **two-sample z-test** for a difference in means. - Treatment and control have **equal** sample size \(n\). - Use \(\hat\sigma = \text{std}(x)\) as the population standard deviation estimate. Task: - Return the **minimum integer per-group sample size** `n` required to detect `effect_size` at level `alpha` with `power`. ## 2) Difference-in-Differences (DiD) + parallel-trend validation You are given three equal-length arrays: - `period[i]`: time indicator (contains at least a “pre” and a “post” period; may contain multiple pre periods) - `group[i]`: 0 = control, 1 = treatment - `outcome[i]`: numeric outcome And a numeric `threshold` for trend validation. Definitions: - Let \(\bar{Y}_{g,t}\) be the mean outcome for group \(g\in\{0,1\}\) in period \(t\). - The DiD estimate is: \[ \text{DiD} = (\bar{Y}_{1,post}-\bar{Y}_{1,pre}) - (\bar{Y}_{0,post}-\bar{Y}_{0,pre}). \] Parallel-trend / trend validation requirement: - If there are **multiple pre periods**, compute the group difference \(d_t = \bar{Y}_{1,t} - \bar{Y}_{0,t}\) for each pre period \(t\), sort pre periods by time, and validate: \[ \max_t |d_{t} - d_{t-1}| \le \text{threshold}. \] - If there is only a single pre period, treat trend validation as passing. Task: - Return (a) the DiD estimate and (b) whether the pre-trend validation passes under the `threshold`. ## 3) Bayes’ rule posterior probability You are given probabilities (as floats) describing an event \(A\) and evidence \(B\), such as: - `p_A` = \(P(A)\) - `p_B_given_A` = \(P(B\mid A)\) - `p_B_given_not_A` = \(P(B\mid \neg A)\) Task: - Compute and return the posterior probability \(P(A\mid B)\). ## 4) Logistic regression: top-3 features You are given: - `X`: a 2D array where each **row corresponds to one feature** and each **column corresponds to one observation** (shape: `num_features × num_samples`) - `y`: binary outcome array of length `num_samples` (values in {0,1}) - `feature_names`: array of length `num_features` Task: - Fit a logistic regression model to predict `y` from `X` (include an intercept). - Rank features by **absolute value of their fitted coefficient** (exclude the intercept). - Return the **names of the top 3 features** in descending order of importance. Notes: - Handle ties deterministically (e.g., break ties by feature name ascending). - Assume inputs are well-formed and numeric.

Quick Answer: This multi-part question evaluates a data scientist's competencies in statistical inference (sample size and z-test calculations), causal inference and parallel-trends validation (difference-in-differences), Bayesian probability updating, and interpretable supervised learning feature importance, all framed as coding tasks.

Related Interview Questions

  • Implement Sliding-Window Rate Limiter - Roblox (medium)
  • Find target-heavy sliding windows - Roblox (medium)
  • Find most frequent call path in logs - Roblox (medium)
  • Track Highest-Earning Experience - Roblox (medium)
  • Find the Most Frequent Log Call - Roblox (easy)
Roblox logo
Roblox
Oct 18, 2025, 12:00 AM
Data Scientist
Take-home Project
Coding & Algorithms
9
0

You are completing a CodeSignal-style assessment (Python or R). Implement solutions for the following four independent questions.

1) Two-sample z-test: required sample size

You are given:

  • x : numeric array of historical observations for the metric (use it to estimate the metric standard deviation sigma )
  • alpha : significance level (e.g., 0.05)
  • power : desired power (e.g., 0.8)
  • effect_size : the minimum detectable absolute difference in means, Δ\DeltaΔ

Assumptions:

  • Two-sided two-sample z-test for a difference in means.
  • Treatment and control have equal sample size nnn .
  • Use σ^=std(x)\hat\sigma = \text{std}(x)σ^=std(x) as the population standard deviation estimate.

Task:

  • Return the minimum integer per-group sample size n required to detect effect_size at level alpha with power .

2) Difference-in-Differences (DiD) + parallel-trend validation

You are given three equal-length arrays:

  • period[i] : time indicator (contains at least a “pre” and a “post” period; may contain multiple pre periods)
  • group[i] : 0 = control, 1 = treatment
  • outcome[i] : numeric outcome

And a numeric threshold for trend validation.

Definitions:

  • Let Yˉg,t\bar{Y}_{g,t}Yˉg,t​ be the mean outcome for group g∈{0,1}g\in\{0,1\}g∈{0,1} in period ttt .
  • The DiD estimate is:

DiD=(Yˉ1,post−Yˉ1,pre)−(Yˉ0,post−Yˉ0,pre).\text{DiD} = (\bar{Y}_{1,post}-\bar{Y}_{1,pre}) - (\bar{Y}_{0,post}-\bar{Y}_{0,pre}).DiD=(Yˉ1,post​−Yˉ1,pre​)−(Yˉ0,post​−Yˉ0,pre​).

Parallel-trend / trend validation requirement:

  • If there are multiple pre periods , compute the group difference dt=Yˉ1,t−Yˉ0,td_t = \bar{Y}_{1,t} - \bar{Y}_{0,t}dt​=Yˉ1,t​−Yˉ0,t​ for each pre period ttt , sort pre periods by time, and validate:

max⁡t∣dt−dt−1∣≤threshold.\max_t |d_{t} - d_{t-1}| \le \text{threshold}.maxt​∣dt​−dt−1​∣≤threshold.

  • If there is only a single pre period, treat trend validation as passing.

Task:

  • Return (a) the DiD estimate and (b) whether the pre-trend validation passes under the threshold .

3) Bayes’ rule posterior probability

You are given probabilities (as floats) describing an event AAA and evidence BBB, such as:

  • p_A = P(A)P(A)P(A)
  • p_B_given_A = P(B∣A)P(B\mid A)P(B∣A)
  • p_B_given_not_A = P(B∣¬A)P(B\mid \neg A)P(B∣¬A)

Task:

  • Compute and return the posterior probability P(A∣B)P(A\mid B)P(A∣B) .

4) Logistic regression: top-3 features

You are given:

  • X : a 2D array where each row corresponds to one feature and each column corresponds to one observation (shape: num_features × num_samples )
  • y : binary outcome array of length num_samples (values in {0,1})
  • feature_names : array of length num_features

Task:

  • Fit a logistic regression model to predict y from X (include an intercept).
  • Rank features by absolute value of their fitted coefficient (exclude the intercept).
  • Return the names of the top 3 features in descending order of importance.

Notes:

  • Handle ties deterministically (e.g., break ties by feature name ascending).
  • Assume inputs are well-formed and numeric.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Roblox•More Data Scientist•Roblox Data Scientist•Roblox Coding & Algorithms•Data Scientist Coding & Algorithms
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • Careers
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.