This question evaluates implementation skills in supervised machine learning and numerical optimization, specifically computing Gini impurity and selecting optimal binary splits for a decision tree alongside deriving and applying a single-step gradient descent update for linear regression.
You are given partially implemented code and must complete key functions. Implement the missing parts with correct logic and reasonable efficiency.
You are building a binary decision tree classifier for tabular numeric features.
Implement:
gini(y)
that computes the Gini impurity of labels
y
(binary labels 0/1).
best_split(X, y)
that finds the best
(feature_index, threshold)
to split on.
Definitions/requirements:
X
is an
n x d
matrix of floats.
y
is length
n
, values in
{0,1}
.
(j, t)
meaning:
X[i][j] <= t
X[i][j] > t
None
if no valid split exists.
Constraints:
1 <= n <= 2e4
,
1 <= d <= 50
You are fitting linear regression with mean squared error:
Implement:
gd_step(X, y, w, b, lr)
that performs
one
gradient descent update and returns
(new_w, new_b)
.
Requirements:
X
is
n x d
,
y
length
n
(real-valued).
w
length
d
,
b
scalar.
n
).
n=1
and
d=1
.