Build a Minimal RAG Tool Using the Mistral API
Context
You have an API token and need to implement a small retrieval-augmented generation (RAG) tool in Python that can answer questions over a local folder of Markdown and PDF files using the Mistral API. The tool should support both a CLI and an HTTP server.
Requirements
-
Implement document ingestion, chunking, and an in-memory vector index for retrieval.
-
Provide a CLI with commands:
-
index
<path>
-
ask
<question>
-
serve (HTTP server exposing a /chat endpoint)
-
Call chat/completions with streaming; include the top-k retrieved chunks in the prompt and return source citations.
-
Add exponential backoff/retries for 429 and timeouts, plus structured error handling.
-
Configure via environment variables for API key, model names, and ports.
-
Include a README with setup steps and minimal tests.
-
Briefly explain your retrieval algorithm choices and a quick way to evaluate answer quality.
Assumptions
-
Language: Python 3.10+.
-
Use the Mistral HTTP API directly to avoid client-library version mismatch.
-
You may persist the built index to disk so that ask and serve can reuse it across processes, while the core index data structure remains in-memory when serving queries.
-
Supported file types: .md and .pdf.