PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Amazon

Write and explain gradient descent pseudocode

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's understanding of batch gradient descent for linear regression with an intercept, covering concepts such as mean squared error optimization, vectorized matrix operations, analytical gradient computation, learning rate schedules, stopping criteria, feature scaling, and L2 regularization.

  • medium
  • Amazon
  • Machine Learning
  • Data Scientist

Write and explain gradient descent pseudocode

Company: Amazon

Role: Data Scientist

Category: Machine Learning

Difficulty: medium

Interview Round: Onsite

Write vectorized pseudocode (Python-like) for batch gradient descent to fit a linear regression model with intercept using mean squared error. Clearly explain each step and variable. Your pseudocode should expose inputs X ∈ R^{m×n} (assume an added intercept column), y ∈ R^{m}, learning_rate α, max_iters, tolerance, and should output θ ∈ R^{n} and cost_history. Derive the update rule from J(θ) = (1/(2m)) ||Xθ − y||^2, show that ∇J(θ) = (1/m) X^T (Xθ − y), and implement θ ← θ − α ∇J(θ). Then: - Stopping criteria: implement at least two (max iters, relative cost decrease or ||∇J||_2 below tolerance) and describe when each is preferable. - Learning rate: discuss constant vs time-decayed schedules; how to pick α to avoid divergence; how feature scaling/standardization affects convergence rate and the condition number of X^T X. - Regularization: modify J(θ) to include L2 with λ (J(θ) = (1/(2m))||Xθ − y||^2 + (λ/(2m))||θ_{1:}||^2 where the intercept is excluded). Write the new gradient and discuss λ selection.

Quick Answer: This question evaluates a candidate's understanding of batch gradient descent for linear regression with an intercept, covering concepts such as mean squared error optimization, vectorized matrix operations, analytical gradient computation, learning rate schedules, stopping criteria, feature scaling, and L2 regularization.

Related Interview Questions

  • Predicting the Next Elevator Call Location - Amazon (medium)
  • Explain Transformer and MoE Fundamentals - Amazon (medium)
  • Explain Core ML Interview Concepts - Amazon (hard)
  • Evaluate NLP Classification Models - Amazon (easy)
  • Explain overfitting, regularization, and LLM techniques - Amazon (medium)
Amazon logo
Amazon
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
2
0

Task: Batch Gradient Descent for Linear Regression (with Intercept)

You are interviewing for a Data Scientist role and are asked to implement batch gradient descent to fit a linear regression model with an intercept using mean squared error (MSE).

Given

  • Design matrix X ∈ R^{m×n} that already includes an intercept column of ones (i.e., X[:, 0] = 1).
  • Target vector y ∈ R^{m}.
  • Learning rate α > 0.
  • max_iters (maximum number of iterations).
  • tolerance (for convergence checks).

Requirements

  1. Derive and state the gradient for the objective:
    • Objective: J(θ) = (1/(2m)) ||Xθ − y||².
    • Show that ∇J(θ) = (1/m) Xᵀ (Xθ − y), and implement the update θ ← θ − α ∇J(θ).
  2. Provide vectorized, Python-like pseudocode for batch gradient descent that:
    • Uses only matrix/vector operations (no loops over samples).
    • Exposes inputs X, y, α, max_iters, tolerance.
    • Outputs θ ∈ R^{n} and cost_history (list of J values per iteration).
  3. Stopping criteria: implement at least two, e.g.,
    • max_iters, and either:
      • relative cost decrease below tolerance, or
      • ||∇J(θ)||₂ below tolerance. Describe when each is preferable.
  4. Learning rate: discuss
    • Constant vs time-decayed schedules.
    • How to pick α to avoid divergence.
    • How feature scaling/standardization affects convergence and the condition number of XᵀX.
  5. Regularization (L2): modify
    • J(θ) = (1/(2m))||Xθ − y||² + (λ/(2m))||θ_{1:}||² (exclude the intercept from penalty).
    • Provide the new gradient and update rule, and discuss λ selection.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.