Debug a broken Transformer implementation

Q: Debug a broken Transformer implementation

This question evaluates a candidate's ability to debug and validate a Transformer implementation, focusing on attention masking, parameter initialization, loss alignment, and other implementation-level correctness issues.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Q: What difficulty level is this interview question?

This is a hard difficulty Machine Learning question, commonly asked during Onsite rounds at OpenAI.

Q: What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at OpenAI during technical interviews.

Question

You are given a small Transformer model implementation (e.g., in PyTorch) plus a tiny training script. The code executes, but the model does not match a reference implementation: unit tests that check (1) the forward-pass output for a fixed input/seed and (2) the training loss for one step either fail or are inconsistent.

Task: Debug the model so that it runs end-to-end and matches the expected outputs/loss. The buggy code contains multiple independent issues, including:

An error in the attention mask (shape/broadcasting or causal/padding masking is applied incorrectly).
Incorrect parameter initialization (some weights are initialized with the wrong distribution/scale or not initialized at all).
A bug in the loss computation due to misaligned positions (e.g., logits/labels are shifted incorrectly for next-token prediction).
One additional hidden bug of similar difficulty (e.g., wrong softmax dimension, missing attention scaling by sqrt(d_k), wrong dtype/device handling, dropout/eval-mode misuse, or an off-by-one in sequence lengths).

Explain how you would systematically find and fix these issues, and what the correct implementations should look like.

Debug a broken Transformer implementation

Quick Overview

Solution

Comments (0)