Implement four DS coding tasks

Q: Implement four DS coding tasks

This multi-part question evaluates a data scientist's competencies in statistical inference (sample size and z-test calculations), causal inference and parallel-trends validation (difference-in-differences), Bayesian probability updating, and interpretable supervised learning feature importance, all framed as coding tasks.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

You are completing a CodeSignal-style assessment (Python or R). Implement solutions for the following four independent questions.

1) Two-sample z-test: required sample size

You are given:

x : numeric array of historical observations for the metric (use it to estimate the metric standard deviation sigma )
alpha : significance level (e.g., 0.05)
power : desired power (e.g., 0.8)
effect_size : the minimum detectable absolute difference in means, $\Delta$

Assumptions:

Two-sided two-sample z-test for a difference in means.
Treatment and control have equal sample size $n$ .
Use $\hat\sigma = \text{std}(x)$ as the population standard deviation estimate.

Task:

Return the minimum integer per-group sample size n required to detect effect_size at level alpha with power .

2) Difference-in-Differences (DiD) + parallel-trend validation

You are given three equal-length arrays:

period[i] : time indicator (contains at least a “pre” and a “post” period; may contain multiple pre periods)
group[i] : 0 = control, 1 = treatment
outcome[i] : numeric outcome

And a numeric threshold for trend validation.

Definitions:

Let $\bar{Y}_{g,t}$ be the mean outcome for group $g\in\{0,1\}$ in period $t$ .
The DiD estimate is:

$\text{DiD} = (\bar{Y}_{1,post}-\bar{Y}_{1,pre}) - (\bar{Y}_{0,post}-\bar{Y}_{0,pre}).$

Parallel-trend / trend validation requirement:

If there are multiple pre periods , compute the group difference $d_t = \bar{Y}_{1,t} - \bar{Y}_{0,t}$ for each pre period $t$ , sort pre periods by time, and validate:

$\max_t |d_{t} - d_{t-1}| \le \text{threshold}.$

If there is only a single pre period, treat trend validation as passing.

Task:

Return (a) the DiD estimate and (b) whether the pre-trend validation passes under the threshold .

3) Bayes’ rule posterior probability

You are given probabilities (as floats) describing an event $A$ and evidence $B$ , such as:

p_A = $P(A)$
p_B_given_A = $P(B\mid A)$
p_B_given_not_A = $P(B\mid \neg A)$

Task:

Compute and return the posterior probability $P(A\mid B)$ .

4) Logistic regression: top-3 features

You are given:

X : a 2D array where each row corresponds to one feature and each column corresponds to one observation (shape: num_features × num_samples )
y : binary outcome array of length num_samples (values in {0,1})
feature_names : array of length num_features

Task:

Fit a logistic regression model to predict y from X (include an intercept).
Rank features by absolute value of their fitted coefficient (exclude the intercept).
Return the names of the top 3 features in descending order of importance.

Notes:

Handle ties deterministically (e.g., break ties by feature name ascending).
Assume inputs are well-formed and numeric.

Implement four DS coding tasks

Overview

1) Two-sample z-test: required sample size

2) Difference-in-Differences (DiD) + parallel-trend validation

3) Bayes’ rule posterior probability

4) Logistic regression: top-3 features

Comments (0)