Implement Positional Encodings for a Transformer Language Model
You are building a Transformer-based language model. Transformers are permutation-equivariant without positional information, so you must inject token order. Implement positional encodings and integrate them into a minimal PyTorch Transformer LM.
Requirements:
-
Choose either sinusoidal or learned positional encodings (you may show both).
-
Provide PyTorch code that:
-
Computes positional encodings.
-
Adds them to token embeddings with correct tensor shapes and broadcasting.
-
Integrates them into a simple Transformer-based language model.
-
Explain the equations and tensor shapes involved.
-
Discuss expected training/inference symptoms if positional information is omitted.
-
Describe how you would verify the fix empirically (ablations, metrics, sanity checks).