Implement and Debug Backprop in NumPy

Q: Implement and Debug Backprop in NumPy

This is a Machine Learning interview question from OpenAI for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Two-Layer Neural Network: Backpropagation and Gradient Check (NumPy)

Context

You are implementing a fully connected two-layer neural network for multi-class classification with the architecture:

Affine (XW1 + b1) → ReLU → Affine (·W2 + b2) → softmax → cross-entropy loss.

Assume a mini-batch of N examples, input dimension D, hidden size H, and C classes.

X ∈ R^{N×D}, y ∈ {0, 1, ..., C−1}
W1 ∈ R^{D×H}, b1 ∈ R^{H}
W2 ∈ R^{H×C}, b2 ∈ R^{C}

Task

Derive the backpropagation equations for all parameters (W1, b1, W2, b2), clearly applying the chain rule and stating the tensor shapes at each step. Explain any linear algebra identities used (e.g., (AB)^T = B^T A^T, broadcasting rules, d/dW (XW) = X^T upstream).
Implement NumPy (Python 3.7+) code to compute the forward pass, softmax cross-entropy loss, and analytical gradients. Use vectorized linear algebra; do not use automatic differentiation or explicit Python loops over the batch.
Implement gradient checking via central finite differences on a small synthetic dataset. Report the relative error between analytical and numerical gradients.
Demonstrate how you would debug and fix a mismatch between analytical and numerical gradients (e.g., a missing 1/N factor or an incorrect ReLU mask). Explain the symptoms and the fix.

Requirements

Use a numerically stable softmax (subtract row-wise max from logits before exponentiating).
Show shapes at each layer and for each gradient.
Keep code self-contained and runnable with only NumPy. Loops are allowed in gradient checking.
Include comments in code indicating shapes and broadcasting.

Deliverables

Derivation with shapes and chain rule steps.
NumPy code for forward pass, loss, and backprop gradients.
Gradient checking routine and example run on a toy dataset.
Debugging narrative for a deliberately introduced mismatch and its fix.