This question evaluates understanding of Transformer self-attention in the Machine Learning domain, testing the ability to analyze time and space complexity, memory–computation trade-offs, and the role of approximation strategies for efficiency.
In the context of Transformer-style models, analyze the computational complexity of self-attention.
Assume a sequence length of and hidden dimension .
Login required