Design an ML search system
Company: OpenAI
Role: Machine Learning Engineer
Category: ML System Design
Difficulty: hard
Interview Round: Technical Screen
Design an ML-powered enterprise document search system. Specify requirements (query latency SLA, freshness, multi-tenant access control, data privacy/PII, cost). Describe the indexing pipeline (connectors, parsing/OCR, normalization), feature generation (BM25 signals, embeddings), storage (inverted index plus vector store) and a hybrid retrieval strategy. Detail the ranking stack (learning-to-rank or neural reranker), query understanding (spelling, synonyms, embeddings), and personalization/feedback loops. Explain online/offline evaluation (NDCG/Recall@k, interleaving/A/B), monitoring, and guardrails (sensitive-content filters). Address scalability, caching, result deduplication, multilingual support, and how you enforce ACLs at query time without leaking documents.
Quick Answer: This question evaluates system-design and information-retrieval competencies for machine-learning-powered enterprise document search, covering ingestion and indexing pipelines, hybrid lexical and semantic retrieval, feature generation and ranking, multi-tenant access control, and security/compliance.