Prompt
Design a Retrieval-Augmented Generation (RAG) system that answers user questions using an organization’s internal documents (PDFs, wiki pages, tickets, and policies) while minimizing hallucinations.
Requirements
-
Inputs
: user natural-language query; a continuously updated document corpus.
-
Outputs
: a grounded answer with
citations
(snippets + document links/IDs).
-
Quality goals
:
-
High answer correctness and groundedness.
-
Handle ambiguous questions by asking clarifying questions when needed.
-
System goals
:
-
Low latency (interactive).
-
Scalable to millions of documents.
-
Support frequent document updates (new/edited/deleted docs).
-
Security: enforce
document-level access control
(per user/role) and prevent data leakage.
-
Observability: logging, monitoring, evaluation, and iterative improvement.
What to cover
Explain the end-to-end architecture including:
-
Ingestion + preprocessing (chunking, metadata, dedup).
-
Embedding generation and indexing.
-
Retrieval (vector + keyword), reranking, and context construction.
-
LLM prompting and citation generation.
-
Caching, rate limiting, and fallbacks.
-
Offline/online evaluation and A/B testing.
-
Failure modes and mitigations (hallucinations, stale data, prompt injection).