Prompt
You are given a dataset of n 1D samples {(xi,yi)}i=1n, where xi and yi are real numbers.
We want to fit a linear model:
y^=ax+b
by minimizing the mean squared error (MSE).
Tasks
-
Define the loss
function for this problem (e.g., MSE over the dataset).
-
Using the chain rule / backprop-style reasoning,
derive the gradients
∂a∂L
and
∂b∂L
.
-
Describe (and optionally write pseudocode for) how to
train a and b using SGD
(or mini-batch SGD):
-
parameter initialization
-
per-step gradient computation
-
update rule
-
learning rate choice / scheduling
-
stopping criteria
-
Discuss common
pitfalls and edge cases
(e.g., scaling, divergence, choosing batch size).
Output / Expected Result
After training, return the learned parameters a and b that approximately minimize the chosen loss on the provided data.