Prompt
Design a Retrieval-Augmented Generation (RAG) system that can answer user questions using an internal knowledge base containing multiple modalities (at least text and images; optionally PDFs/tables).
Requirements
-
Users ask natural-language questions and want grounded answers with citations.
-
Knowledge base items may include:
-
Plain text docs (wiki pages, tickets)
-
PDFs (mixed text + images)
-
Images (diagrams/screenshots) with minimal surrounding metadata
-
The system should retrieve relevant evidence across modalities and use an LLM to generate an answer.
What to cover
-
Data ingestion and preprocessing for each modality
-
Indexing strategy (vector, keyword, hybrid) and how you would store metadata
-
Retrieval at query time (including cross-modal retrieval)
-
How you would handle chunking, embeddings, and re-ranking
-
Prompting / grounding strategy and citation generation
-
Quality evaluation (offline + online), latency, and cost considerations
-
Failure modes (hallucinations, stale data, missing modality) and mitigations
You may make reasonable assumptions and state them clearly.