Explain Transformers and QKV matrices

Q: Explain Transformers and QKV matrices

This is a Machine Learning interview question from NVIDIA for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Transformer Self-Attention: Q, K, V, Multi-Head, and Positional Encoding

Context: You are given a sequence of token embeddings X (length n, model dimension d_model). Focus on the scaled dot-product self-attention inside a Transformer block.

Answer the following:

Define the query (Q), key (K), and value (V) matrices:
- How are Q, K, V produced from input embeddings?
- What information does each carry?
What specifically does the V matrix represent, and how is it used after attention weights are computed?
At a high level, how do similarity scores become attention weights and then outputs?
Compare Transformers to RNNs/LSTMs:
- How do Transformers address sequential dependency and long-range context limitations?
Briefly outline multi-head attention and positional encoding:
- What are they, and why are they needed?
- When do they matter at inference time (e.g., generation/caching, positional schemes)?

Explain Transformers and QKV matrices

Transformer Self-Attention: Q, K, V, Multi-Head, and Positional Encoding

Solution (Locked)

Comments (0)