Problem
You are building a recommender system with a two-stage ranking pipeline:
-
Candidate retrieval (recall):
fetch top-
K
candidates for a request (user + context).
-
Heavy ranker (heavy ranker):
score those
K
candidates with a more expensive model and return the final list.
Traditionally K is a fixed constant (e.g., 200–2000). You are asked to design a system/model that chooses K dynamically per request, i.e., K = f(user, context, retrieval signals, …).
Requirements / trade-offs
-
K
should
not
be a static number.
-
Increasing
K
can improve downstream quality (recall / revenue / engagement) but increases:
-
latency (p99)
-
compute cost for the heavy ranker
-
potential negative effects (e.g., noisy candidates hurting ranker)
-
The design should describe:
-
What the
objective/metrics
are
-
What the model
predicts/outputs
-
How to
train
it (labels, data)
-
How to
serve
it online (architecture, guardrails)
-
How to
evaluate
it offline and online
You may assume the retrieval layer can return up to a configured K_max (e.g., 5000), and the system must choose an actual K (or an equivalent cutoff) for each request.