This question evaluates understanding of ML system design for real-time query suggestion and ranking, encompassing competencies in candidate generation and ranking models, data ingestion and labeling pipelines, online/nearline feature engineering, feedback and retraining loops, latency and scalability constraints, multilingual handling, and safety/policy compliance. It is commonly asked to assess the ability to balance trade-offs between model quality, latency, throughput, and guardrails in operational systems; the domain is ML System Design and the level of abstraction combines practical application with systems-level conceptual reasoning.

You are designing a real-time system that generates and ranks search query suggestions shown to users (e.g., in a mobile app search box or entry points). The objective is to maximize click-through rate (CTR) on these suggested queries while meeting low-latency and high-scale requirements.
Assume:
Describe an end-to-end design covering:
Discuss key trade-offs, cold-start handling, safety/guardrails, and latency budgets.
Login required