Build and design a Mistral RAG agent
Company: Mistral AI
Role: Software Engineer
Category: ML System Design
Difficulty: hard
Interview Round: Technical Screen
Build and discuss an LLM-powered retrieval-augmented agent that calls the Mistral API. Implement a minimal Python tool that:
(
1) ingests a small local corpus (e.g., Markdown/PDF), chunks text, generates embeddings, and indexes them;
(
2) retrieves top-k passages for a user query;
(
3) composes a prompt and invokes Mistral chat/completions with streaming;
(
4) keeps short-term conversation memory;
(
5) reads the API token from an environment variable;
(
6) handles errors, retries, and rate limits; and
(
7) includes a brief README and smoke tests. Then, explain your system design choices: chunking strategy, embedding model choice, vector index selection, latency and cost budgeting, caching, prompt templating, safety/PII handling, logging/monitoring, offline RAG evaluation, and fallbacks when retrieval fails.
Quick Answer: This question evaluates a candidate's skills in building a retrieval-augmented generation (RAG) system, covering document ingestion, chunking, embedding generation, vector indexing, retrieval, LLM API-driven streaming completions, short-term conversation memory, and operational reliability features.