This question evaluates a candidate's ability to design and implement a retrieval-augmented generation (RAG) system using a large language model API, including document ingestion, chunking, vector indexing and retrieval, streaming completions, service interfaces (CLI and HTTP), and resilience features like retries, persistence, and structured error handling. Commonly asked in the ML System Design domain to assess practical engineering and architectural reasoning, it tests API integration, production-readiness (configuration, error handling, persistence, and testing) and emphasizes practical application with conceptual understanding of retrieval trade-offs.

You have an API token and need to implement a small retrieval-augmented generation (RAG) tool in Python that can answer questions over a local folder of Markdown and PDF files using the Mistral API. The tool should support both a CLI and an HTTP server.
Login required