The matching group is ranking-related. The interview had one open-ended ML System/Model Design question. I'm not very familiar with retrieval-related topics and didn't come up with a great answer during the interview. Feel free to discuss in the comments. In the recall stage of a recommender system, K candidates are typically retrieved and sent to the heavy ranker. Design a system or model to decide what K should be. K should not be a static number — the goal is to use a model to determine K dynamically. Some thoughts (not necessarily correct, open to discussion): - Define the goal metric: how do you determine whether the chosen K is good? What do you measure by? - The proportion of candidates that pass the heavy ranker? - What should the model output? - Directly output the optimal K → seems very hard to implement - Output the probability of each candidate being selected by the heavy ranker (i.e., the goal metric from above)? → then design a threshold, dynamically set K based on how many candidates pass the threshold? → but how do you design the threshold — it seems like defining K just becomes defining the threshold - What features and what model to use. This is closely related to the output question above.

This question evaluates a candidate's competency in ML system design and decision-focused modeling for recommender pipelines, focusing on retrieval-ranking trade-offs, latency and compute cost constraints, and dynamic per-request selection of K.

How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a medium difficulty ML System Design question, commonly asked during Onsite rounds at TikTok.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at TikTok during technical interviews.

Design a model to choose dynamic K | TikTok Interview Question

Problem

You are building a recommender system with a two-stage ranking pipeline:

Candidate retrieval (recall): fetch top- K candidates for a request (user + context).
Heavy ranker (heavy ranker): score those K candidates with a more expensive model and return the final list.

Traditionally K is a fixed constant (e.g., 200–2000). You are asked to design a system/model that chooses K dynamically per request, i.e., K = f(user, context, retrieval signals, …).

Requirements / trade-offs

K should not be a static number.
Increasing K can improve downstream quality (recall / revenue / engagement) but increases:
- latency (p99)
- compute cost for the heavy ranker
- potential negative effects (e.g., noisy candidates hurting ranker)
The design should describe:
- What the objective/metrics are
- What the model predicts/outputs
- How to train it (labels, data)
- How to serve it online (architecture, guardrails)
- How to evaluate it offline and online

You may assume the retrieval layer can return up to a configured K_max (e.g., 5000), and the system must choose an actual K (or an equivalent cutoff) for each request.

Problem

You are building a recommender system with a two-stage ranking pipeline:

Candidate retrieval (recall): fetch top- K candidates for a request (user + context).

Heavy ranker (heavy ranker): score those K candidates with a more expensive model and return the final list.

Traditionally K is a fixed constant (e.g., 200–2000). You are asked to design a system/model that chooses K dynamically per request, i.e., K = f(user, context, retrieval signals, …).

Requirements / trade-offs

K should not be a static number.

Increasing K can improve downstream quality (recall / revenue / engagement) but increases:

latency (p99)
compute cost for the heavy ranker
potential negative effects (e.g., noisy candidates hurting ranker)

The design should describe:

What the objective/metrics are
What the model predicts/outputs
How to train it (labels, data)
How to serve it online (architecture, guardrails)
How to evaluate it offline and online

You may assume the retrieval layer can return up to a configured K_max (e.g., 5000), and the system must choose an actual K (or an equivalent cutoff) for each request.

Design a model to choose dynamic K

Quick Overview

Problem

Requirements / trade-offs

Solution

Comments (0)

Design a model to choose dynamic K

Quick Overview

Problem

Requirements / trade-offs

Solution

Comments (0)