PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Coding & Algorithms/Meta

Implement scaled dot-product attention

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to implement single-head scaled dot-product attention, including correct matrix operations for Q, K, V, masked attention handling, and numerically stable softmax, testing competencies in linear algebra and machine learning model internals within the Coding & Algorithms / Machine Learning domain.

  • medium
  • Meta
  • Coding & Algorithms
  • Machine Learning Engineer

Implement scaled dot-product attention

Company: Meta

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

## Task In this interview you are asked to **hand-write the forward pass of attention** from the mathematical formula (no need to run code). Implement **single-head scaled dot-product attention**. ### Inputs - `Q`: a 2D array of shape `(Tq, d)` (queries) - `K`: a 2D array of shape `(Tk, d)` (keys) - `V`: a 2D array of shape `(Tk, dv)` (values) - Optional `mask`: a 2D array of shape `(Tq, Tk)` where `mask[i][j] = 0/False` means position `j` is **not allowed** to be attended to by query `i`. ### Output - `O`: a 2D array of shape `(Tq, dv)` defined by: 1) Compute scores: \[ S = \frac{QK^\top}{\sqrt{d}} \quad \text{(shape } Tq \times Tk\text{)} \] 2) Apply mask (if provided): for any masked-out entry `S[i][j]`, treat it as \(-\infty\) before softmax. 3) Apply row-wise softmax to get attention weights: \[ A_{i,:} = \mathrm{softmax}(S_{i,:}) \] 4) Compute the output: \[ O = AV \] ### Requirements / Discussion points - Write clear pseudocode or Python-like code for the above. - Explain how you would implement **numerically stable softmax**. - State the time complexity in terms of `Tq`, `Tk`, `d`, and `dv`. - Mention at least two edge cases (e.g., fully-masked row, large magnitudes, shape mismatches).

Quick Answer: This question evaluates a candidate's ability to implement single-head scaled dot-product attention, including correct matrix operations for Q, K, V, masked attention handling, and numerically stable softmax, testing competencies in linear algebra and machine learning model internals within the Coding & Algorithms / Machine Learning domain.

Related Interview Questions

  • Solve Two Backtracking Array Problems - Meta (hard)
  • Solve Array, Matrix, and Recommendation Problems - Meta (medium)
  • Find a String Containing Another - Meta (medium)
  • Solve Subarray Sum and Local Minimum - Meta (hard)
  • Validate abbreviations and brackets - Meta (medium)
Meta logo
Meta
Sep 2, 2025, 12:00 AM
Machine Learning Engineer
Onsite
Coding & Algorithms
1
0

Task

In this interview you are asked to hand-write the forward pass of attention from the mathematical formula (no need to run code).

Implement single-head scaled dot-product attention.

Inputs

  • Q : a 2D array of shape (Tq, d) (queries)
  • K : a 2D array of shape (Tk, d) (keys)
  • V : a 2D array of shape (Tk, dv) (values)
  • Optional mask : a 2D array of shape (Tq, Tk) where mask[i][j] = 0/False means position j is not allowed to be attended to by query i .

Output

  • O : a 2D array of shape (Tq, dv) defined by:
  1. Compute scores:

S=QK⊤d(shape Tq×Tk)S = \frac{QK^\top}{\sqrt{d}} \quad \text{(shape } Tq \times Tk\text{)}S=d​QK⊤​(shape Tq×Tk)

  1. Apply mask (if provided): for any masked-out entry S[i][j] , treat it as −∞-\infty−∞ before softmax.
  2. Apply row-wise softmax to get attention weights:

Ai,:=softmax(Si,:)A_{i,:} = \mathrm{softmax}(S_{i,:})Ai,:​=softmax(Si,:​)

  1. Compute the output:

O=AVO = AVO=AV

Requirements / Discussion points

  • Write clear pseudocode or Python-like code for the above.
  • Explain how you would implement numerically stable softmax .
  • State the time complexity in terms of Tq , Tk , d , and dv .
  • Mention at least two edge cases (e.g., fully-masked row, large magnitudes, shape mismatches).

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Meta•More Machine Learning Engineer•Meta Machine Learning Engineer•Meta Coding & Algorithms•Machine Learning Engineer Coding & Algorithms
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.