How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a medium difficulty Machine Learning question, commonly asked during Onsite rounds at Startups.Com.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Startups.Com during technical interviews.

Explain attention variants and their tradeoffs | Startups.Com Interview Question

Explain attention variants and their tradeoffs

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of Transformer attention mechanisms—scaled dot-product, multi-head, group-query and multi-query variants—and measures competency in model internals, tensor shapes, computational and memory complexity, and inference deployment trade-offs.

Startups.Com

Mar 10, 2026, 12:00 AM

Machine Learning Engineer

Onsite

Machine Learning

You are asked to explain and reason about modern Transformer attention mechanisms.

Scaled dot-product attention

Define the operation mathematically (including the scaling term) and explain why scaling is used.
Provide the typical tensor shapes for Q , K , V in a batched setting.

Multi-head attention (MHA)

Explain how MHA differs from single-head attention, including the projection matrices, per-head computation, concatenation, and output projection.
Discuss compute and memory complexity with respect to sequence length L , number of heads H , and head dimension d .

Group-query attention (GQA)

Define GQA and how it differs from:
- MHA (each head has its own K/V)
- Multi-query attention (MQA: all heads share one K/V)
Explain why GQA is commonly used for LLM inference.

When would you choose MHA vs GQA vs MQA?

Discuss quality/expressiveness tradeoffs, KV-cache size, bandwidth, and practical deployment constraints.

Solution

Show

Comments (0)

Loading comments...

Browse More Questions

More Machine Learning•More Startups.Com•More Machine Learning Engineer•Startups.Com Machine Learning Engineer•Startups.Com Machine Learning•Machine Learning Engineer Machine Learning