Design a multimodal RAG assistant

Q: Design a multimodal RAG assistant

This question evaluates system design and machine learning engineering skills for building a multimodal Retrieval-Augmented Generation (RAG) assistant, covering competencies in data ingestion and preprocessing across modalities, indexing and retrieval strategies, embeddings and re-ranking, grounding/prompting with citations, and evaluation and failure-mode mitigation. It is commonly asked to gauge an engineer's ability to architect scalable, grounded QA pipelines that balance retrieval quality, latency, and cost; it sits in the System Design domain and tests both high-level architectural reasoning and practical implementation considerations.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Prompt

Design a Retrieval-Augmented Generation (RAG) system that can answer user questions using an internal knowledge base containing multiple modalities (at least text and images; optionally PDFs/tables).

Requirements

Users ask natural-language questions and want grounded answers with citations.
Knowledge base items may include:
- Plain text docs (wiki pages, tickets)
- PDFs (mixed text + images)
- Images (diagrams/screenshots) with minimal surrounding metadata
The system should retrieve relevant evidence across modalities and use an LLM to generate an answer.

What to cover

Data ingestion and preprocessing for each modality
Indexing strategy (vector, keyword, hybrid) and how you would store metadata
Retrieval at query time (including cross-modal retrieval)
How you would handle chunking, embeddings, and re-ranking
Prompting / grounding strategy and citation generation
Quality evaluation (offline + online), latency, and cost considerations
Failure modes (hallucinations, stale data, missing modality) and mitigations

You may make reasonable assumptions and state them clearly.

Design a multimodal RAG assistant

Prompt

Requirements

What to cover

Solution

Comments (0)

Design a multimodal RAG assistant

Overview

Prompt

Requirements

What to cover

Solution

Comments (0)