Debug a GPT training pipeline

Q: Debug a GPT training pipeline

This is a ML System Design interview question from Applied Intuition for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

Fix three bugs in a minimal GPT to meet a training-loss target

You are given a Colab notebook with a minimal GPT-style language model implemented in PyTorch (token embedding → transformer blocks → LM head), along with training and inference code. On a small toy dataset, training currently fails to reach the target loss.

Your task:

Identify and fix the following three issues so that training loss drops below a specified threshold on a small dataset:
1. Incorrect attention masking (causal mask and/or padding mask mishandled).
2. A bug in the training loop (e.g., missing optimizer.zero_grad(), not calling model.train(), misaligned input/target token shift, wrong device placement, or incorrect loss reduction).
3. Missing positional encoding integration.
Provide concrete code changes (edits or snippets) that implement each fix.
Provide unit tests that would have caught each bug.
Include a brief rationale for each fix.

Assume:

PyTorch 2.x, CUDA if available.
A tiny character-level dataset (or synthetic tokens) with a small vocab and fixed max sequence length (e.g., 64), with optional padding. Use ignore_index for pad.
Success criterion: training loss on the toy training set drops below 1.0 within a few epochs on CPU, or within 0.5 on GPU.

Debug a GPT training pipeline

Fix three bugs in a minimal GPT to meet a training-loss target

Solution (Locked)

Comments (0)