You are building a Retrieval-Augmented Generation (RAG) system that uses an LLM plus a vector database. Before creating embeddings and indexing documents, you must split long documents into chunks.
Describe how you would design the chunking strategy. In your answer, discuss:
-
How you would choose
chunk size
and
overlap
and the trade-offs involved (recall vs. context size, latency, etc.).
-
How you would use
document structure
(e.g., headings, paragraphs, sections) vs. naive fixed-length splits.
-
When you might use more advanced methods like
semantic chunking
or dynamic chunk sizes.
-
How you would evaluate and iterate on your chunking strategy in a real system.