Write and explain gradient descent pseudocode

Q: Write and explain gradient descent pseudocode

This question evaluates a candidate's understanding of batch gradient descent for linear regression with an intercept, covering concepts such as mean squared error optimization, vectorized matrix operations, analytical gradient computation, learning rate schedules, stopping criteria, feature scaling, and L2 regularization.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Task: Batch Gradient Descent for Linear Regression (with Intercept)

You are interviewing for a Data Scientist role and are asked to implement batch gradient descent to fit a linear regression model with an intercept using mean squared error (MSE).

Given

Design matrix X ∈ R^{m×n} that already includes an intercept column of ones (i.e., X[:, 0] = 1).
Target vector y ∈ R^{m}.
Learning rate α > 0.
max_iters (maximum number of iterations).
tolerance (for convergence checks).

Requirements

Derive and state the gradient for the objective:
- Objective: J(θ) = (1/(2m)) ||Xθ − y||².
- Show that ∇J(θ) = (1/m) Xᵀ (Xθ − y), and implement the update θ ← θ − α ∇J(θ).
Provide vectorized, Python-like pseudocode for batch gradient descent that:
- Uses only matrix/vector operations (no loops over samples).
- Exposes inputs X, y, α, max_iters, tolerance.
- Outputs θ ∈ R^{n} and cost_history (list of J values per iteration).
Stopping criteria: implement at least two, e.g.,
- max_iters, and either:
  - relative cost decrease below tolerance, or
  - ||∇J(θ)||₂ below tolerance. Describe when each is preferable.
Learning rate: discuss
- Constant vs time-decayed schedules.
- How to pick α to avoid divergence.
- How feature scaling/standardization affects convergence and the condition number of XᵀX.
Regularization (L2): modify
- J(θ) = (1/(2m))||Xθ − y||² + (λ/(2m))||θ_{1:}||² (exclude the intercept from penalty).
- Provide the new gradient and update rule, and discuss λ selection.

Write and explain gradient descent pseudocode

Task: Batch Gradient Descent for Linear Regression (with Intercept)

Given

Requirements

Solution

Comments (0)

Write and explain gradient descent pseudocode

Overview

Task: Batch Gradient Descent for Linear Regression (with Intercept)

Given

Requirements

Solution

Comments (0)