PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/Meta

Build a Mistral-powered RAG agent

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to design and implement a retrieval-augmented generation (RAG) system using a large language model API, including document ingestion, chunking, vector indexing and retrieval, streaming completions, service interfaces (CLI and HTTP), and resilience features like retries, persistence, and structured error handling. Commonly asked in the ML System Design domain to assess practical engineering and architectural reasoning, it tests API integration, production-readiness (configuration, error handling, persistence, and testing) and emphasizes practical application with conceptual understanding of retrieval trade-offs.

  • hard
  • Meta
  • ML System Design
  • Software Engineer

Build a Mistral-powered RAG agent

Company: Meta

Role: Software Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Onsite

You are given an API token. Build a small retrieval-augmented generation (RAG) tool that can answer questions over a local folder of Markdown/PDF files using the Mistral API. Requirements: ( 1) Implement document ingestion, chunking, and an in-memory vector index for retrieval; ( 2) Provide a CLI with commands: index <path>, ask <question>, and serve an HTTP /chat endpoint; ( 3) Call chat/completions with streaming, include top-k retrieved chunks and return source citations; ( 4) Add exponential backoff/retries for 429/timeouts and structured error handling; ( 5) Configure via environment variables (e.g., pyenv) for API key, model, and ports; ( 6) Include a README with setup steps and minimal tests; ( 7) Briefly explain your retrieval algorithm choices and a quick method to evaluate answer quality.

Quick Answer: This question evaluates a candidate's ability to design and implement a retrieval-augmented generation (RAG) system using a large language model API, including document ingestion, chunking, vector indexing and retrieval, streaming completions, service interfaces (CLI and HTTP), and resilience features like retries, persistence, and structured error handling. Commonly asked in the ML System Design domain to assess practical engineering and architectural reasoning, it tests API integration, production-readiness (configuration, error handling, persistence, and testing) and emphasizes practical application with conceptual understanding of retrieval trade-offs.

Related Interview Questions

  • Design an Automated Ticket Investigation Agent - Meta (hard)
  • Prevent Private Code Leakage in Coding Agents - Meta (medium)
  • Design Place Recommendation System - Meta (medium)
  • Design a Code Review Agent - Meta (medium)
  • Design a Short-Video Recommendation System - Meta (medium)
Meta logo
Meta
Sep 6, 2025, 12:00 AM
Software Engineer
Onsite
ML System Design
3
0

Build a Minimal RAG Tool Using the Mistral API

Context

You have an API token and need to implement a small retrieval-augmented generation (RAG) tool in Python that can answer questions over a local folder of Markdown and PDF files using the Mistral API. The tool should support both a CLI and an HTTP server.

Requirements

  1. Implement document ingestion, chunking, and an in-memory vector index for retrieval.
  2. Provide a CLI with commands:
    • index <path>
    • ask <question>
    • serve (HTTP server exposing a /chat endpoint)
  3. Call chat/completions with streaming; include the top-k retrieved chunks in the prompt and return source citations.
  4. Add exponential backoff/retries for 429 and timeouts, plus structured error handling.
  5. Configure via environment variables for API key, model names, and ports.
  6. Include a README with setup steps and minimal tests.
  7. Briefly explain your retrieval algorithm choices and a quick way to evaluate answer quality.

Assumptions

  • Language: Python 3.10+.
  • Use the Mistral HTTP API directly to avoid client-library version mismatch.
  • You may persist the built index to disk so that ask and serve can reuse it across processes, while the core index data structure remains in-memory when serving queries.
  • Supported file types: .md and .pdf.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Meta•More Software Engineer•Meta Software Engineer•Meta ML System Design•Software Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.