This question evaluates understanding and implementation of optimization algorithms for machine learning, specifically mini-batch gradient descent, and measures competency in numerical optimization, algorithmic implementation, and training dynamics within the Coding & Algorithms domain for Data Scientist roles.
Implement a generic mini-batch gradient descent routine: inputs are differentiable loss L(θ; x), initial θ0, batch size b, steps T, and learning-rate schedule ηt. (a) Provide stopping criteria (gradient norm, validation loss patience). (b) Compare full-batch, SGD, and mini-batch in terms of convergence noise and wall-clock performance. (c) Explain effects of batch size on generalization and how to use learning-rate warmup or cosine decay.