Define QKV for recommender cross-attention

Q: Define QKV for recommender cross-attention

This is a Machine Learning interview question from ByteDance for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

You are designing a deep-learning–based recommendation system that uses a Transformer-style cross-attention block to model the interaction between a user and a candidate item.

The model has these typical inputs:

A user behavior sequence : a list of items the user has interacted with in the past, each already embedded as a vector (e.g., size d ).
A candidate item whose relevance score you want to predict, also embedded as a vector of size d .
Optional context features (time, device, location, etc.) that can also be embedded.

You decide to use a cross-attention layer somewhere in the model rather than only self-attention.

Propose a concrete way to define the Query (Q) , Key (K) , and Value (V) tensors in this cross-attention block using the inputs above. Explain what each of Q, K, and V represents semantically.
Give at least two different reasonable design choices for how to set up Q, K, and V (for example, one where the candidate item is the query and one where the user history is the query). For each design, explain:
- What is used as Q, K, and V.
- What interaction the attention mechanism is modeling.
- Pros and cons or when that design is preferable.
Briefly explain how cross-attention here differs from self-attention within the user behavior sequence, and why cross-attention can be useful in recommendation systems.

Define QKV for recommender cross-attention

Solution

Comments (0)