System Design: Query-Generation to Maximize CTR
Context
You are designing a real-time system that generates and ranks search query suggestions shown to users (e.g., in a mobile app search box or entry points). The objective is to maximize click-through rate (CTR) on these suggested queries while meeting low-latency and high-scale requirements.
Assume:
-
Real-time suggestions under 100 ms p95 latency.
-
Tens to hundreds of millions of daily users, multilingual content.
-
Safety and policy compliance are required.
Task
Describe an end-to-end design covering:
-
High-level architecture (online and offline paths).
-
Data ingestion and labeling pipeline.
-
Feature engineering (online/nearline readiness).
-
Models for candidate generation and ranking (training and serving).
-
Feedback/learning loop (exploration, debiasing, retraining).
-
Evaluation: key offline and online metrics (with definitions).
Discuss key trade-offs, cold-start handling, safety/guardrails, and latency budgets.