You are given partially implemented code and must complete key functions. Implement the missing parts with correct logic and reasonable efficiency.
You are building a binary decision tree classifier for tabular numeric features.
Implement:
gini(y)
that computes the Gini impurity of labels
y
(binary labels 0/1).
best_split(X, y)
that finds the best
(feature_index, threshold)
to split on.
Definitions/requirements:
X
is an
n x d
matrix of floats.
y
is length
n
, values in
{0,1}
.
(j, t)
meaning:
X[i][j] <= t
X[i][j] > t
None
if no valid split exists.
Constraints:
1 <= n <= 2e4
,
1 <= d <= 50
You are fitting linear regression with mean squared error:
Implement:
gd_step(X, y, w, b, lr)
that performs
one
gradient descent update and returns
(new_w, new_b)
.
Requirements:
X
is
n x d
,
y
length
n
(real-valued).
w
length
d
,
b
scalar.
n
).
n=1
and
d=1
.