PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/ML System Design/OpenAI

Design LLM search handling long token inputs

Last updated: May 26, 2026

Quick Overview

This question evaluates a candidate's ability to design scalable LLM-powered search systems, including retrieval, indexing, long-context management, and integration of LLMs with document storage, and is categorized under ML system design.

  • hard
  • OpenAI
  • ML System Design
  • Machine Learning Engineer

Design LLM search handling long token inputs

Company: OpenAI

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Onsite

You are asked to design an **LLM-powered search system** that lets users query a large corpus of documents (e.g., internal wikis, PDFs, logs, and web pages) and receive natural-language answers. A key challenge is that **both documents and user queries can be very long**, often exceeding the context window (maximum token length) of the underlying large language model (LLM). For example, a user might paste multiple pages of logs or a long contract as part of their query. Design the system with a focus on: 1. **Overall architecture** - How documents are stored and indexed. - How search queries are processed. - How the LLM is used to generate final answers. 2. **Handling large token length / context limits** - How to handle very long **documents** that do not fit into the LLM context. - How to handle very long **queries** (e.g., multi-page text pasted by the user). - How to avoid blowing past the context window while still providing high-quality, relevant answers. 3. **Additional considerations** - Latency and cost: how you keep response times reasonable and control token usage. - Quality: how you keep retrieved content relevant and avoid missing important context when chunking or summarizing. - Any caching or optimizations you would introduce. Describe your design in detail: - Draw or describe the main components and data flow (ingestion, indexing, retrieval, LLM interaction, etc.). - Explain at least 2–3 concrete strategies for dealing with large token length/context limits, and how they fit into your architecture. - Call out trade-offs between different design choices.

Quick Answer: This question evaluates a candidate's ability to design scalable LLM-powered search systems, including retrieval, indexing, long-context management, and integration of LLMs with document storage, and is categorized under ML system design.

Related Interview Questions

  • Design a Text-to-Video Generation Service - OpenAI (medium)
  • Design a Text-to-Video Generation System - OpenAI (hard)
  • Design a Real-Time Sensor Intelligence System - OpenAI (medium)
  • Mine Novel Images from Unlabeled Data - OpenAI (medium)
  • Design a GPU-Efficient Video Service - OpenAI (medium)
OpenAI logo
OpenAI
Apr 6, 2025, 12:00 AM
Machine Learning Engineer
Onsite
ML System Design
11
0

You are asked to design an LLM-powered search system that lets users query a large corpus of documents (e.g., internal wikis, PDFs, logs, and web pages) and receive natural-language answers.

A key challenge is that both documents and user queries can be very long, often exceeding the context window (maximum token length) of the underlying large language model (LLM). For example, a user might paste multiple pages of logs or a long contract as part of their query.

Design the system with a focus on:

  1. Overall architecture
    • How documents are stored and indexed.
    • How search queries are processed.
    • How the LLM is used to generate final answers.
  2. Handling large token length / context limits
    • How to handle very long documents that do not fit into the LLM context.
    • How to handle very long queries (e.g., multi-page text pasted by the user).
    • How to avoid blowing past the context window while still providing high-quality, relevant answers.
  3. Additional considerations
    • Latency and cost: how you keep response times reasonable and control token usage.
    • Quality: how you keep retrieved content relevant and avoid missing important context when chunking or summarizing.
    • Any caching or optimizations you would introduce.

Describe your design in detail:

  • Draw or describe the main components and data flow (ingestion, indexing, retrieval, LLM interaction, etc.).
  • Explain at least 2–3 concrete strategies for dealing with large token length/context limits, and how they fit into your architecture.
  • Call out trade-offs between different design choices.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More OpenAI•More Machine Learning Engineer•OpenAI Machine Learning Engineer•OpenAI ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.