Explain Transformer Layers and FFN Rationale

Q: Explain Transformer Layers and FFN Rationale

This is a Machine Learning interview question from UiPath for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Explain the Transformer Architecture

Goal

Provide a clear, step-by-step explanation of a Transformer layer, covering:

Multi-head self-attention (MHA)
Residual connections
Layer normalization
Position-wise feed-forward network (FFN)

Also explain why an FFN is needed after attention and what benefits it provides. Walk through the vector computations with matrix dimensions and shapes, including:

Q/K/V projections
Attention score scaling and softmax
Weighted sum to form context vectors
Concatenation of heads and output projection
Residual pathways and layer norms
The FFN’s two linear layers with activation

Optionally compare encoder vs. decoder layers and discuss how representations evolve across stacked layers.

Assumptions and Notation

Batch size: B
Sequence length: L
Model dimension: d_model
Number of heads: h
Per-head dimensions: d_k = d_v = d_model / h (typical)
Input token embeddings plus positional encodings: X ∈ R^{B × L × d_model}

Explain Transformer Layers and FFN Rationale

Explain the Transformer Architecture

Goal

Assumptions and Notation

Solution (Locked)

Comments (0)