Derive backpropagation for a two-layer neural network (Affine → ReLU → Affine → softmax). Write NumPy (Python 3. 7) code to compute the forward pass, loss, and analytical gradients without automatic differentiation using vectorized linear algebra. Implement gradient checking via finite differences and demonstrate how you would debug and fix a mismatch between analytical and numerical gradients. Clearly state tensor shapes at each layer, show the chain rule application, and explain any linear algebra identities used (e.g., broadcasting rules, transpose of products).

This question evaluates understanding of backpropagation and the chain rule, proficiency in vectorized NumPy implementation, numerical stability of softmax/cross-entropy, and competency in gradient checking and debugging of model gradients.

How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a medium difficulty Machine Learning question, commonly asked during Technical Screen rounds at OpenAI.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at OpenAI during technical interviews.

Implement and Debug Backprop in NumPy | OpenAI Interview Question

Two-Layer Neural Network: Backpropagation and Gradient Check (NumPy)

Context

You are implementing a fully connected two-layer neural network for multi-class classification with the architecture:

Affine (XW1 + b1) → ReLU → Affine (·W2 + b2) → softmax → cross-entropy loss.

Assume a mini-batch of N examples, input dimension D, hidden size H, and C classes.

X ∈ R^{N×D}, y ∈ {0, 1, ..., C−1}
W1 ∈ R^{D×H}, b1 ∈ R^{H}
W2 ∈ R^{H×C}, b2 ∈ R^{C}

Task

Derive the backpropagation equations for all parameters (W1, b1, W2, b2), clearly applying the chain rule and stating the tensor shapes at each step. Explain any linear algebra identities used (e.g., (AB)^T = B^T A^T, broadcasting rules, d/dW (XW) = X^T upstream).
Implement NumPy (Python 3.7+) code to compute the forward pass, softmax cross-entropy loss, and analytical gradients. Use vectorized linear algebra; do not use automatic differentiation or explicit Python loops over the batch.
Implement gradient checking via central finite differences on a small synthetic dataset. Report the relative error between analytical and numerical gradients.
Demonstrate how you would debug and fix a mismatch between analytical and numerical gradients (e.g., a missing 1/N factor or an incorrect ReLU mask). Explain the symptoms and the fix.

Requirements

Use a numerically stable softmax (subtract row-wise max from logits before exponentiating).
Show shapes at each layer and for each gradient.
Keep code self-contained and runnable with only NumPy. Loops are allowed in gradient checking.
Include comments in code indicating shapes and broadcasting.

Deliverables

Derivation with shapes and chain rule steps.
NumPy code for forward pass, loss, and backprop gradients.
Gradient checking routine and example run on a toy dataset.
Debugging narrative for a deliberately introduced mismatch and its fix.

Context

You are implementing a fully connected two-layer neural network for multi-class classification with the architecture:

Affine (XW1 + b1) → ReLU → Affine (·W2 + b2) → softmax → cross-entropy loss.

Assume a mini-batch of N examples, input dimension D, hidden size H, and C classes.

X ∈ R^{N×D}, y ∈ {0, 1, ..., C−1}

W1 ∈ R^{D×H}, b1 ∈ R^{H}

W2 ∈ R^{H×C}, b2 ∈ R^{C}

Task

Derive the backpropagation equations for all parameters (W1, b1, W2, b2), clearly applying the chain rule and stating the tensor shapes at each step. Explain any linear algebra identities used (e.g., (AB)^T = B^T A^T, broadcasting rules, d/dW (XW) = X^T upstream).

Implement NumPy (Python 3.7+) code to compute the forward pass, softmax cross-entropy loss, and analytical gradients. Use vectorized linear algebra; do not use automatic differentiation or explicit Python loops over the batch.

Implement gradient checking via central finite differences on a small synthetic dataset. Report the relative error between analytical and numerical gradients.

Demonstrate how you would debug and fix a mismatch between analytical and numerical gradients (e.g., a missing 1/N factor or an incorrect ReLU mask). Explain the symptoms and the fix.

Requirements

Use a numerically stable softmax (subtract row-wise max from logits before exponentiating).

Show shapes at each layer and for each gradient.

Keep code self-contained and runnable with only NumPy. Loops are allowed in gradient checking.

Include comments in code indicating shapes and broadcasting.

Implement and Debug Backprop in NumPy

Quick Overview

Two-Layer Neural Network: Backpropagation and Gradient Check (NumPy)

Context

Task

Requirements

Deliverables

Solution

Comments (0)

Implement and Debug Backprop in NumPy

Quick Overview

Two-Layer Neural Network: Backpropagation and Gradient Check (NumPy)

Context

Task

Requirements

Deliverables

Solution

Comments (0)