You are completing a CodeSignal-style assessment (Python or R). Implement solutions for the following four independent questions.
1) Two-sample z-test: required sample size
You are given:
-
x
: numeric array of historical observations for the metric (use it to estimate the metric standard deviation
sigma
)
-
alpha
: significance level (e.g., 0.05)
-
power
: desired power (e.g., 0.8)
-
effect_size
: the minimum detectable absolute difference in means,
Δ
Assumptions:
-
Two-sided
two-sample z-test
for a difference in means.
-
Treatment and control have
equal
sample size
n
.
-
Use
σ^=std(x)
as the population standard deviation estimate.
Task:
-
Return the
minimum integer per-group sample size
n
required to detect
effect_size
at level
alpha
with
power
.
2) Difference-in-Differences (DiD) + parallel-trend validation
You are given three equal-length arrays:
-
period[i]
: time indicator (contains at least a “pre” and a “post” period; may contain multiple pre periods)
-
group[i]
: 0 = control, 1 = treatment
-
outcome[i]
: numeric outcome
And a numeric threshold for trend validation.
Definitions:
-
Let
Yˉg,t
be the mean outcome for group
g∈{0,1}
in period
t
.
-
The DiD estimate is:
DiD=(Yˉ1,post−Yˉ1,pre)−(Yˉ0,post−Yˉ0,pre).
Parallel-trend / trend validation requirement:
-
If there are
multiple pre periods
, compute the group difference
dt=Yˉ1,t−Yˉ0,t
for each pre period
t
, sort pre periods by time, and validate:
maxt∣dt−dt−1∣≤threshold.
-
If there is only a single pre period, treat trend validation as passing.
Task:
-
Return (a) the DiD estimate and (b) whether the pre-trend validation passes under the
threshold
.
3) Bayes’ rule posterior probability
You are given probabilities (as floats) describing an event A and evidence B, such as:
-
p_A
=
P(A)
-
p_B_given_A
=
P(B∣A)
-
p_B_given_not_A
=
P(B∣¬A)
Task:
-
Compute and return the posterior probability
P(A∣B)
.
4) Logistic regression: top-3 features
You are given:
-
X
: a 2D array where each
row corresponds to one feature
and each
column corresponds to one observation
(shape:
num_features × num_samples
)
-
y
: binary outcome array of length
num_samples
(values in {0,1})
-
feature_names
: array of length
num_features
Task:
-
Fit a logistic regression model to predict
y
from
X
(include an intercept).
-
Rank features by
absolute value of their fitted coefficient
(exclude the intercept).
-
Return the
names of the top 3 features
in descending order of importance.
Notes:
-
Handle ties deterministically (e.g., break ties by feature name ascending).
-
Assume inputs are well-formed and numeric.