This question evaluates a candidate's competency in ML system design and engineering with emphasis on integrating large language models into large-scale recommendation pipelines, covering architecture choices, online/offline placement, latency and cost trade-offs, safety/moderation, multi-modal support, and evaluation/monitoring strategies.

You are designing enhancements for a high-throughput, mobile-first recommendation system that serves a mixed-media feed (short videos, images, text, live). The system must operate under tight latency and cost budgets, handle multi-lingual content, and meet strong safety/moderation requirements.
Outline how to incorporate LLMs end-to-end, covering:
Provide concrete design choices, resource estimates, and guardrails suitable for a technical screening interview.
Login required