You are given an unfamiliar GitHub repository that implements a Transformer model. The interviewer claims there is a bug causing one of the following symptoms:
-
training loss diverges or becomes NaN,
-
model quality is far worse than expected,
-
outputs violate causality for an autoregressive model,
-
shapes/broadcasting silently behave incorrectly.
You cannot rewrite the whole project; your task is to find and fix the bug efficiently.
Explain:
-
The step-by-step debugging strategy you would use.
-
What minimal experiments/tests you would run to localize the issue.
-
The most common Transformer implementation bugs you would check first.
-
How you would validate the fix and prevent regressions.