You are asked to implement a few core ML building blocks from scratch (no ML libraries such as scikit-learn). You may use basic numeric operations and standard data structures.
Part A — AUC-ROC
Given:
-
y_true
: length
n
list/array of binary labels in
{0,1}
-
y_score
: length
n
list/array of real-valued prediction scores (higher means more likely positive)
Task:
-
Compute the ROC curve points (TPR vs FPR) as the threshold varies.
-
Compute
AUC
(area under the ROC curve).
Clarifications:
-
Handle ties in
y_score
correctly.
-
Define what you return when all labels are the same (all 0s or all 1s).
Part B — Softmax
Given a vector of logits z = [z1, z2, ..., zk], implement softmax:
softmax(z)i=∑j=1kezjezi
Task:
-
Return a probability vector of length
k
.
-
Make your implementation numerically stable.
Part C — Logistic Regression
Given:
-
Feature matrix
X
of shape
(n, d)
-
Binary labels
y
of shape
(n,)
in
{0,1}
Task:
-
Implement logistic regression training using gradient descent (batch or mini-batch).
-
Specify the loss you optimize (cross-entropy / negative log-likelihood).
-
Optionally include L2 regularization and explain how it changes the gradient.
-
Return learned parameters and a
predict_proba
/
predict
function.
Constraints/expectations:
-
Discuss time complexity per training epoch.
-
Mention common pitfalls (learning rate, feature scaling, overflow, class imbalance).