System Design: Add Semantic Search to an Existing CRUD Service Using Chroma
Context
You own a document CRUD service (create/read/update/delete) that stores documents with an id, text body, and optional metadata. Extend this service by integrating a Chroma vector store to support semantic search over documents. Assume you can add a background worker if needed, but aim for a minimal, production-ready design.
Tasks
-
Define the vector collection schema and explain how text embeddings are produced (model choice, dimensionality, normalization, and how metadata is stored).
-
Implement API endpoints to:
-
Upsert documents into the vector store.
-
Delete documents (by id and/or by metadata filter).
-
Query by vector similarity (semantic search) with optional metadata filters.
-
Implement a
search_query
function that returns a response object with a required
results
list (may be empty; never
None
). Include per-result scores and metadata.
-
Describe how you would handle:
-
Indexing strategy and parameters.
-
Pagination for vector search results.
-
Filtering by metadata.
-
Eventual consistency between the CRUD store and the vector index.
-
Discuss production concerns: latency targets, batching strategies, and error handling (including retries, timeouts, and idempotency).