This question evaluates a candidate's competency in ML system design and decision-focused modeling for recommender pipelines, focusing on retrieval-ranking trade-offs, latency and compute cost constraints, and dynamic per-request selection of K.
You are building a recommender system with a two-stage ranking pipeline:
Traditionally K is a fixed constant (e.g., 200–2000). You are asked to design a system/model that chooses K dynamically per request, i.e., K = f(user, context, retrieval signals, …).
You may assume the retrieval layer can return up to a configured K_max (e.g., 5000), and the system must choose an actual K (or an equivalent cutoff) for each request.