Explain transformer architecture and variants

Q: Explain transformer architecture and variants

This is a Machine Learning interview question from Google for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Technical Screen: Explain the Transformer Architecture

Scope

Provide a structured deep-dive into Transformers. Your explanation should cover theory, shapes/equations, engineering considerations, and practical adaptations to molecular data.

Required Topics

Encoder/decoder stack
- Encoder blocks and decoder blocks (where self-attention, cross-attention, and position-wise feed-forward networks fit)
- Residual connections and normalization placement
Attention mechanisms
- Self-attention vs cross-attention
- Scaled dot-product attention: equation and tensor shapes for queries (Q), keys (K), and values (V)
- Multi-head attention: how heads are formed and concatenated
Positional information
- Absolute positional encodings: sinusoidal vs learned
- Relative position methods (e.g., relative biases, rotary encodings) and their impact on order/generalization
Model families and masking
- Encoder-only vs decoder-only vs encoder–decoder models
- Masking strategies for autoregressive decoding (causal mask, padding mask)
Complexity and scaling
- Time/memory cost O(n²) of attention and practical inference details (KV cache)
- Methods to handle long sequences: sparse and linear-attention variants; trade-offs
Stability and initialization
- LayerNorm placement (pre-LN vs post-LN), residual connections, stability considerations
- Initialization and other training practices (dropout, LR warmup, etc.)
Adapting to molecular data
- SMILES: tokenization, stereochemistry handling, data augmentation
- Molecular graphs: inputs/features, positional/edge encodings
- Training objectives: masked LM, autoregressive LM, contrastive pretraining

Explain transformer architecture and variants

Technical Screen: Explain the Transformer Architecture

Scope

Required Topics

Solution (Locked)

Comments (0)