System Design: Scalable Tagging Service
Context
You are designing a multi-tenant tagging service for a large-scale SaaS product. Items (e.g., documents, issues, pages) can have multiple tags. The system must support high read/write throughput and efficient queries by one or multiple tags.
Assume scale on the order of:
-
100M items, 10M distinct tags, average 5–20 tags per item
-
Peak 10k read QPS, 2k write QPS (tag add/remove)
-
Multi-tenant isolation is required
Requirements
-
Create and manage tags
-
Associate/dissociate multiple tags with an item
-
Query items:
-
by a single tag
-
by a combination of tags (AND/OR), with pagination and sort (e.g., recency)
-
High availability and low latency for reads; scalable writes
-
Consistency model: clearly define and implement (eventual vs read-your-writes)
-
Sharding, caching, and indexing strategies for scale
-
Operability: monitoring, reindexing, backfills
Deliverables
-
Data model and storage choices
-
Read and write paths (including APIs)
-
Indexing strategy for tag queries (single and combined)
-
Sharding/partitioning plan
-
Caching strategy and invalidation
-
Consistency trade-offs and mechanisms
-
Handling hot tags, skew, and growth