Implement and explain the forward and backward pass of a small neural network using both NumPy and PyTorch tensors. Start with a batched input X of shape [B, D], a first linear layer with weights W1 and bias b1, a nonlinearity such as ReLU, and a second linear layer with weights W2 and bias b2 that produces class logits. Compute the loss for a classification task, derive gradients for inputs, weights, and biases, and make sure all tensor shapes are correct for batched computation.
Then show how the same computation would be expressed with PyTorch autograd. Be prepared to explain how the computation graph is formed, how gradients flow backward, and what can go wrong with broadcasting, tensor reshaping, or in-place operations. A good solution should demonstrate solid understanding of backpropagation, tensor manipulation, common neural network layers, and how a batched linear layer is implemented from first principles.