Explain Transformer Layers and FFN Rationale
Company: UiPath
Role: Machine Learning Engineer
Category: Machine Learning
Difficulty: medium
Interview Round: Technical Screen
Explain the Transformer architecture. Describe the structure of each layer (multi-head self-attention, residual connections, layer normalization, and position-wise feed-forward network). Why is an FFN needed after attention, and what benefits does it provide? Walk through the vector computations step by step: the Q/K/V projections (matrix dimensions and shapes), attention score scaling and softmax, weighted sum to form context vectors, concatenation of heads, output projection, residual pathways, layer norms, and the FFN’s two linear layers with activation. Optionally compare encoder vs. decoder layers and discuss how representations evolve across stacked layers.
Quick Answer: This question evaluates a candidate's understanding of Transformer architecture components such as multi-head self-attention, residual connections, layer normalization, and the position-wise feed-forward network, including competence with tensor shapes and matrix projections.