PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/TikTok

Define QKV for recommender cross-attention

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of Transformer-style cross-attention and the concrete design of Query, Key, and Value tensors for deep-learning recommender systems, testing representation semantics, embedding alignment, and interaction modeling between user history, candidate items, and context.

  • hard
  • TikTok
  • Machine Learning
  • Machine Learning Engineer

Define QKV for recommender cross-attention

Company: TikTok

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

You are designing a deep-learning–based recommendation system that uses a Transformer-style **cross-attention** block to model the interaction between a user and a candidate item. The model has these typical inputs: - A **user behavior sequence**: a list of items the user has interacted with in the past, each already embedded as a vector (e.g., size `d`). - A **candidate item** whose relevance score you want to predict, also embedded as a vector of size `d`. - Optional **context features** (time, device, location, etc.) that can also be embedded. You decide to use a cross-attention layer somewhere in the model rather than only self-attention. 1. Propose a concrete way to define the **Query (Q)**, **Key (K)**, and **Value (V)** tensors in this cross-attention block using the inputs above. Explain what each of Q, K, and V represents semantically. 2. Give at least **two different reasonable design choices** for how to set up Q, K, and V (for example, one where the candidate item is the query and one where the user history is the query). For each design, explain: - What is used as Q, K, and V. - What interaction the attention mechanism is modeling. - Pros and cons or when that design is preferable. 3. Briefly explain how cross-attention here differs from self-attention within the user behavior sequence, and why cross-attention can be useful in recommendation systems.

Quick Answer: This question evaluates understanding of Transformer-style cross-attention and the concrete design of Query, Key, and Value tensors for deep-learning recommender systems, testing representation semantics, embedding alignment, and interaction modeling between user history, candidate items, and context.

Related Interview Questions

  • Design multimodal deployment under compute limits - TikTok (easy)
  • Write self-attention and cross-entropy pseudocode - TikTok (medium)
  • Explain overfitting, dropout, normalization, RL post-training - TikTok (medium)
  • Answer ML fundamentals and diagnostics questions - TikTok (hard)
  • Implement AUC-ROC, softmax, and logistic regression - TikTok (medium)
|Home/Machine Learning/TikTok

Define QKV for recommender cross-attention

TikTok logo
TikTok
Dec 8, 2025, 7:48 PM
hardMachine Learning EngineerTechnical ScreenMachine Learning
5
0

You are designing a deep-learning–based recommendation system that uses a Transformer-style cross-attention block to model the interaction between a user and a candidate item.

The model has these typical inputs:

  • A user behavior sequence : a list of items the user has interacted with in the past, each already embedded as a vector (e.g., size d ).
  • A candidate item whose relevance score you want to predict, also embedded as a vector of size d .
  • Optional context features (time, device, location, etc.) that can also be embedded.

You decide to use a cross-attention layer somewhere in the model rather than only self-attention.

  1. Propose a concrete way to define the Query (Q) , Key (K) , and Value (V) tensors in this cross-attention block using the inputs above. Explain what each of Q, K, and V represents semantically.
  2. Give at least two different reasonable design choices for how to set up Q, K, and V (for example, one where the candidate item is the query and one where the user history is the query). For each design, explain:
    • What is used as Q, K, and V.
    • What interaction the attention mechanism is modeling.
    • Pros and cons or when that design is preferable.
  3. Briefly explain how cross-attention here differs from self-attention within the user behavior sequence, and why cross-attention can be useful in recommendation systems.
Loading comments...

Browse More Questions

More Machine Learning•More TikTok•More Machine Learning Engineer•TikTok Machine Learning Engineer•TikTok Machine Learning•Machine Learning Engineer Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.