Explain transformer architecture and FFN rationale

Q: Explain transformer architecture and FFN rationale

This is a Machine Learning interview question from UiPath for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Transformer Architecture: Components, Shapes, Complexity, and Design Choices

Task

Explain the Transformer architecture in detail for a standard encoder–decoder model used in sequence modeling.

Layer composition
- What sublayers are in each encoder and decoder layer?
- Why include a position-wise feed-forward network (FFN)?
- How do residual connections and layer normalization interact (pre-LN vs post-LN)?
Multi-head attention shapes
- For d_model = 512, h = 8 heads, and sequence length n, give the tensor shapes for Q, K, V, attention scores, attention weights, per-head outputs, the concatenated output, and the final projection output. State shapes both per-head and stacked across heads; you may omit the batch dimension for clarity.
Computational complexity
- Derive the time and memory complexity of self-attention with respect to n and d (assume d = d_model and typical d_k = d_v = d/h). Also note cross-attention complexity.
Implementation choices and trade-offs
- Q/K/V projections (separate vs fused, multi-query/grouped-query attention).
- Number of heads and head dimension.
- Positional encodings (sinusoidal, learned, relative/rotary/ALiBi).
- Include practical trade-offs (speed, memory, quality, length generalization).

Explain transformer architecture and FFN rationale

Transformer Architecture: Components, Shapes, Complexity, and Design Choices

Task

Solution (Locked)

Comments (0)