Tiny Neural Network (From First Principles): Binary Classification
Context
You will implement and analyze a minimal neural network for binary classification with one hidden layer. Assume a dataset with features X ∈ R^{N×D} and labels y ∈ {0,1}^N. The network has:
-
Hidden layer: H units with an activation (ReLU or tanh).
-
Output layer: 1 unit with a sigmoid for P(y=1|x).
Use vectorized NumPy (or similar) without autograd.
Tasks
-
Forward pass
-
Define shapes: W1 ∈ R^{D×H}, b1 ∈ R^{H}, W2 ∈ R^{H×1}, b2 ∈ R^{1}.
-
Compute z1 = XW1 + b1, a1 = f(z1), z2 = a1W2 + b2, p = σ(z2).
-
Loss (numerically stable)
-
Implement binary cross-entropy. Use a stable formulation (e.g., softplus: log(1+exp(x)) or log-sum-exp) to avoid overflow/underflow.
-
Backward pass (analytic gradients; no autograd)
-
Derive and implement gradients for W1, b1, W2, b2.
-
Optimization
-
Implement gradient descent updates for all parameters.
-
Gradient checking
-
Verify gradients by finite differences: g_num ≈ (L(θ+ε) − L(θ−ε)) / (2ε). Report relative errors.
-
Discussion
-
Numerical stability (sigmoid/logistic loss, softplus/log-sum-exp, log1p, expm1, clipping).
-
Initialization (He vs Xavier; biases).
-
Activation choices (ReLU, tanh, sigmoid; pros/cons).
-
Batch size and gradient variance; learning-rate scaling.
Deliverables
-
Clean, vectorized code for forward, loss, backward, training loop, and gradient check.
-
Short written derivations and notes on the topics above.