Design LLM search handling long token inputs

Q: Design LLM search handling long token inputs

This is a ML System Design interview question from OpenAI for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

You are asked to design an LLM-powered search system that lets users query a large corpus of documents (e.g., internal wikis, PDFs, logs, and web pages) and receive natural-language answers.

A key challenge is that both documents and user queries can be very long, often exceeding the context window (maximum token length) of the underlying large language model (LLM). For example, a user might paste multiple pages of logs or a long contract as part of their query.

Design the system with a focus on:

Overall architecture
- How documents are stored and indexed.
- How search queries are processed.
- How the LLM is used to generate final answers.
Handling large token length / context limits
- How to handle very long documents that do not fit into the LLM context.
- How to handle very long queries (e.g., multi-page text pasted by the user).
- How to avoid blowing past the context window while still providing high-quality, relevant answers.
Additional considerations
- Latency and cost: how you keep response times reasonable and control token usage.
- Quality: how you keep retrieved content relevant and avoid missing important context when chunking or summarizing.
- Any caching or optimizations you would introduce.

Describe your design in detail:

Draw or describe the main components and data flow (ingestion, indexing, retrieval, LLM interaction, etc.).
Explain at least 2–3 concrete strategies for dealing with large token length/context limits, and how they fit into your architecture.
Call out trade-offs between different design choices.

Design LLM search handling long token inputs

Solution

Comments (0)