How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a medium difficulty Machine Learning question, commonly asked during Technical Screen rounds at Uber.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Uber during technical interviews.

Implement Multi-Head Self-Attention

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of multi-head self-attention and the competency to implement transformer attention modules using learned Q/K/V projections, head-wise tensor reshaping, attention masking, and PyTorch tensor operations.

Uber

Jan 10, 2026, 12:00 AM

Machine Learning Engineer

Technical Screen

Machine Learning

Implement a multi-head self-attention module in PyTorch without using torch.nn.MultiheadAttention.

Requirements:

Input tensor shape: (batch_size, seq_len, d_model)
Number of heads: num_heads , where d_model % num_heads == 0
Use learned linear projections to produce Q , K , and V from the same input
Split the projected tensors into multiple heads
Compute scaled dot-product attention for each head: Attention(Q, K, V) = softmax(Q K^T / sqrt(head_dim)) V
Support an optional attention mask so masked positions do not contribute to attention scores
Concatenate all heads and apply a final output projection
Return a tensor of shape (batch_size, seq_len, d_model)

You may assume standard PyTorch layers such as nn.Linear, torch.matmul, softmax, view, and transpose are available.

Explain any important tensor shape transformations and common implementation pitfalls.

Solution

Show

Comments (0)

Loading comments...

Browse More Questions

More Machine Learning•More Uber•More Machine Learning Engineer•Uber Machine Learning Engineer•Uber Machine Learning•Machine Learning Engineer Machine Learning