This question evaluates knowledge of positional encoding mechanisms for Transformer language models, covering embedding mathematics, tensor shapes and broadcasting, PyTorch implementation details, expected training and inference symptoms when positional information is omitted, and methods for empirical verification and ablation.
You are building a Transformer-based language model. Transformers are permutation-equivariant without positional information, so you must inject token order. Implement positional encodings and integrate them into a minimal PyTorch Transformer LM.
Requirements:
Login required