PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/Coding & Algorithms/Other

Write mini-batch gradient descent

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding and implementation of optimization algorithms for machine learning, specifically mini-batch gradient descent, and measures competency in numerical optimization, algorithmic implementation, and training dynamics within the Coding & Algorithms domain for Data Scientist roles.

  • Medium
  • Other
  • Coding & Algorithms
  • Data Scientist

Write mini-batch gradient descent

Company: Other

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

Implement a generic mini-batch gradient descent routine: inputs are differentiable loss L(θ; x), initial θ0, batch size b, steps T, and learning-rate schedule ηt. (a) Provide stopping criteria (gradient norm, validation loss patience). (b) Compare full-batch, SGD, and mini-batch in terms of convergence noise and wall-clock performance. (c) Explain effects of batch size on generalization and how to use learning-rate warmup or cosine decay.

Quick Answer: This question evaluates understanding and implementation of optimization algorithms for machine learning, specifically mini-batch gradient descent, and measures competency in numerical optimization, algorithmic implementation, and training dynamics within the Coding & Algorithms domain for Data Scientist roles.

Related Interview Questions

  • Implement a multi-button click detector - Other (hard)
  • Compute total after discounting most expensive item - Other (medium)
  • Return the k-th row of Pascal-like triangle - Other (medium)
  • Implement multiplication without using the multiplication operator - Other (Medium)
  • Prove reservoir sampling correctness - Other (Medium)
Other logo
Other
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Coding & Algorithms
2
0

Implement a generic mini-batch gradient descent routine: inputs are differentiable loss L(θ; x), initial θ0, batch size b, steps T, and learning-rate schedule ηt. (a) Provide stopping criteria (gradient norm, validation loss patience). (b) Compare full-batch, SGD, and mini-batch in terms of convergence noise and wall-clock performance. (c) Explain effects of batch size on generalization and how to use learning-rate warmup or cosine decay.

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Other•More Data Scientist•Other Data Scientist•Other Coding & Algorithms•Data Scientist Coding & Algorithms
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.