Examples
Input: ({'weights': [0.0], 'bias': 0.0}, [([[1.0], [2.0]], [2.0, 4.0])], {'lr': 0.1}, 'mse', 'cpu', 1)
Expected Output: {'weights': [1.0], 'bias': 0.6, 'losses': [10.0], 'device': 'cpu'}
Explanation: Starting from zero, the batch predictions are [0, 0], so the batch MSE is 10.0. One gradient descent step updates the weight to 1.0 and the bias to 0.6.
Input: ({'weights': [0.0], 'bias': 0.0}, [([[1.0]], [1.0]), ([[2.0]], [2.0])], {'lr': 0.1}, 'mse', 'cuda', 2)
Expected Output: {'weights': [0.7696], 'bias': 0.4608, 'losses': [1.48, 0.039168], 'device': 'cuda'}
Explanation: This case has two epochs and two batches, so the loop must correctly repeat zero-grad, forward, backward, and step for every batch. The returned losses are the average batch losses for each epoch.
Input: ({'weights': [1.0, -1.0], 'bias': 0.5}, [], {'lr': 0.01}, 'mse', 'cpu', 2)
Expected Output: {'weights': [1.0, -1.0], 'bias': 0.5, 'losses': [0.0, 0.0], 'device': 'cpu'}
Explanation: With no batches, no parameter updates occur. By definition in this problem, each epoch's average loss is 0.0.
Input: ({'weights': [0.0, 0.0], 'bias': 0.0}, [([[1.0, 2.0], [3.0, 4.0]], [5.0, 11.0])], {'lr': 0.01}, 'mse', 'cpu', 1)
Expected Output: {'weights': [0.38, 0.54], 'bias': 0.16, 'losses': [73.0], 'device': 'cpu'}
Explanation: This verifies that gradients are computed separately for each weight in a multi-feature linear model.