Design AI-Powered Document Search
Company: Workday
Role: Software Engineer
Category: ML System Design
Difficulty: medium
Interview Round: Onsite
Design a system where users upload documents and later search them by structured fields and free-text keywords. The system should use a multi-step AI pipeline to extract metadata and keywords before indexing.
Requirements:
- Support uploads of PDFs and common office documents.
- Extract raw text, document fields such as type, vendor, or date, and useful search keywords.
- Provide reliable asynchronous processing even when OCR or AI services fail intermittently.
- Support fielded queries such as `vendor = Acme AND keyword = renewal`.
- Return low-latency search results and highlight matching terms.
- Discuss data storage, indexing, orchestration, retries, reprocessing, and monitoring.
Quick Answer: This question evaluates system-design and machine-learning engineering skills for building a scalable AI-enabled document ingestion pipeline, covering OCR, metadata and keyword extraction, indexing, fault-tolerant orchestration, retries, reprocessing, and monitoring.