This question evaluates proficiency in debugging Transformer implementations, encompassing codebase navigation, tensor shape and broadcasting reasoning, numerical stability, masking/causality checks, and model validation competencies.
You are given an unfamiliar GitHub repository that implements a Transformer model. The interviewer claims there is a bug causing one of the following symptoms:
You cannot rewrite the whole project; your task is to find and fix the bug efficiently.
Explain: