This question evaluates understanding of Transformer self-attention mechanics — specifically the roles of query, key, and value matrices, multi-head attention, and positional encoding — within the Machine Learning / Deep Learning and sequence modeling domain.
Context: You are given a sequence of token embeddings X (length n, model dimension d_model). Focus on the scaled dot-product self-attention inside a Transformer block.
Answer the following:
Login required