Debug a GPT training pipeline
Company: Applied Intuition
Role: Machine Learning Engineer
Category: ML System Design
Difficulty: medium
Interview Round: Technical Screen
Given a Colab notebook containing a minimal GPT-style language model with training and inference code, identify and fix three issues so that the training loss drops below a specified threshold on a small dataset:
(
1) incorrect attention masking (e.g., causal mask or padding mask mishandled);
(
2) a bug in the training loop (e.g., missing optimizer.zero_grad(), not calling model.train(), misaligned input/target token shift, wrong device placement, or incorrect loss reduction); and
(
3) missing positional encoding integration. Provide concrete code changes, unit tests that would have caught these bugs, and a brief rationale for each fix.
Quick Answer: This question evaluates debugging and implementation skills for transformer-based training pipelines, specifically attention masking, training-loop correctness, positional encoding integration, and the ability to design unit tests that catch such bugs.