PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates practical implementation skills in PyTorch, focusing on model and device management, batch-wise training mechanics, gradient handling, and optimizer interaction.

  • medium
  • Amazon
  • Coding & Algorithms
  • Machine Learning Engineer

Implement PyTorch training loop

Company: Amazon

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

### Implement a basic PyTorch training loop You are given a PyTorch neural network model, a DataLoader that yields `(inputs, targets)` batches, an optimizer, and a loss function. Write a function `train(model, train_loader, optimizer, loss_fn, device, num_epochs)` that: - Moves the model and input batches to the specified device (CPU or GPU). - Runs for `num_epochs` epochs. - For each batch, performs a forward pass, computes the loss, runs backpropagation, and updates the model parameters. - Properly zeros gradients at the right time. - Optionally prints or returns the average training loss per epoch. Clearly show the order of operations inside the training loop (zeroing gradients, forward pass, loss computation, backward pass, optimizer step).

Quick Answer: This question evaluates practical implementation skills in PyTorch, focusing on model and device management, batch-wise training mechanics, gradient handling, and optimizer interaction.

Implement a simplified, judge-friendly version of a PyTorch training loop. Because the judge cannot serialize real PyTorch objects, use these simplified inputs: - `model`: a dictionary `{'weights': [...], 'bias': ...}` representing a single linear layer. - `train_loader`: a list of batches. Each batch is `(inputs, targets)`, where `inputs` is a list of samples and each sample is a list of feature values, and `targets` is a list of scalar labels. - `optimizer`: a dictionary `{'lr': ...}`. - `loss_fn`: always the string `'mse'`. - `device`: either `'cpu'` or `'cuda'`. Device movement is simulated; include it in the returned result. For each epoch, iterate through the batches in order and perform the standard training-loop steps: 1. Zero gradients. 2. Forward pass. 3. Compute mean squared error loss. 4. Backward pass to compute gradients. 5. Optimizer step using gradient descent. The model prediction for one sample is: `prediction = dot(weights, sample) + bias` The loss for one batch is the mean squared error: `MSE = mean((prediction - target)^2)` Return the final weights, final bias, the device string, and the average training loss for each epoch. The epoch loss is the average of that epoch's batch losses. If `train_loader` is empty, the average loss for that epoch is `0.0`. Round all returned floating-point values to 6 decimal places.

Constraints

  • 0 <= num_epochs <= 100
  • 1 <= len(model['weights']) <= 20
  • Each sample contains exactly len(model['weights']) features
  • 0 <= total number of samples across all batches <= 10^4
  • Each batch is non-empty, except that `train_loader` itself may be empty
  • loss_fn is always 'mse'

Examples

Input: ({'weights': [0.0], 'bias': 0.0}, [([[1.0], [2.0]], [2.0, 4.0])], {'lr': 0.1}, 'mse', 'cpu', 1)

Expected Output: {'weights': [1.0], 'bias': 0.6, 'losses': [10.0], 'device': 'cpu'}

Explanation: Starting from zero, the batch predictions are [0, 0], so the batch MSE is 10.0. One gradient descent step updates the weight to 1.0 and the bias to 0.6.

Input: ({'weights': [0.0], 'bias': 0.0}, [([[1.0]], [1.0]), ([[2.0]], [2.0])], {'lr': 0.1}, 'mse', 'cuda', 2)

Expected Output: {'weights': [0.7696], 'bias': 0.4608, 'losses': [1.48, 0.039168], 'device': 'cuda'}

Explanation: This case has two epochs and two batches, so the loop must correctly repeat zero-grad, forward, backward, and step for every batch. The returned losses are the average batch losses for each epoch.

Input: ({'weights': [1.0, -1.0], 'bias': 0.5}, [], {'lr': 0.01}, 'mse', 'cpu', 2)

Expected Output: {'weights': [1.0, -1.0], 'bias': 0.5, 'losses': [0.0, 0.0], 'device': 'cpu'}

Explanation: With no batches, no parameter updates occur. By definition in this problem, each epoch's average loss is 0.0.

Input: ({'weights': [0.0, 0.0], 'bias': 0.0}, [([[1.0, 2.0], [3.0, 4.0]], [5.0, 11.0])], {'lr': 0.01}, 'mse', 'cpu', 1)

Expected Output: {'weights': [0.38, 0.54], 'bias': 0.16, 'losses': [73.0], 'device': 'cpu'}

Explanation: This verifies that gradients are computed separately for each weight in a multi-feature linear model.

Solution

def solution(model, train_loader, optimizer, loss_fn, device, num_epochs):
    if loss_fn != 'mse':
        raise ValueError("Only 'mse' is supported")

    weights = [float(w) for w in model.get('weights', [])]
    bias = float(model.get('bias', 0.0))
    lr = float(optimizer.get('lr', 0.0))
    epoch_losses = []

    def clean(x):
        x = round(float(x), 6)
        if x == -0.0:
            x = 0.0
        return x

    # Simulate moving the model to the requested device.
    moved_device = device

    for _ in range(num_epochs):
        total_loss = 0.0
        batch_count = 0

        for inputs, targets in train_loader:
            batch_size = len(inputs)
            if batch_size == 0:
                continue

            # 1) Zero gradients
            grad_w = [0.0] * len(weights)
            grad_b = 0.0

            # 2) Forward pass
            preds = []
            for sample in inputs:
                pred = bias
                for j, value in enumerate(sample):
                    pred += weights[j] * value
                preds.append(pred)

            # 3) Loss computation (mean squared error)
            loss = 0.0
            for pred, target in zip(preds, targets):
                diff = pred - target
                loss += diff * diff
            loss /= batch_size

            # 4) Backward pass (analytic gradients)
            for sample, pred, target in zip(inputs, preds, targets):
                coeff = 2.0 * (pred - target) / batch_size
                for j, value in enumerate(sample):
                    grad_w[j] += coeff * value
                grad_b += coeff

            # 5) Optimizer step
            for j in range(len(weights)):
                weights[j] -= lr * grad_w[j]
            bias -= lr * grad_b

            total_loss += loss
            batch_count += 1

        avg_loss = total_loss / batch_count if batch_count else 0.0
        epoch_losses.append(clean(avg_loss))

    return {
        'weights': [clean(w) for w in weights],
        'bias': clean(bias),
        'losses': epoch_losses,
        'device': moved_device
    }

Time complexity: O(num_epochs * total_samples * num_features). Space complexity: O(num_features + num_epochs).

Hints

  1. Keep the training loop order strict: zero gradients, forward pass, loss computation, backward pass, then optimizer step.
  2. For MSE on a batch of size n, the derivative with respect to each prediction is `2 * (pred - target) / n`.
Last updated: Apr 27, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Count Connected Components in an Undirected Graph - Amazon (medium)
  • Find Unique Target-Sum Pairs - Amazon (easy)
  • Find Valid IP Addresses in Files - Amazon (medium)
  • Implement Optimal Bucket Batching - Amazon (hard)
  • Implement Cache and Rotate Matrix - Amazon (medium)