System Design Prompt: Globally Distributed, Read-Optimized Key-Value Store
Context and minimal assumptions
Design a globally distributed key-value (KV) store optimized for read-heavy workloads. Assume:
-
Workload: 90–95% reads, 5–10% writes; point lookups dominate, range scans are rare.
-
Data model: opaque values by key; keys up to ~1 KB; median value ~1 KB (max 1 MB).
-
Scale: tens of TB per region; multi-tenant; multi-region across continents.
-
Latency targets: in-region p99 read ≤ 5–10 ms; global p99 read ≤ 150 ms via nearest region.
-
Availability/durability: ≥ 99.99% availability; RPO 0 within a region; cross-region replication tolerated to be async by default.
Task
Propose an end-to-end design and justify your choices. Address the following areas with performance, complexity, and resource trade-offs:
-
OS-level performance considerations (threads vs async I/O, context switching, memory management, filesystem and kernel tuning).
-
Storage layout and indexing (on-disk format, compaction strategy, write amplification trade-offs).
-
Partitioning and sharding (key distribution, shard sizing, rebalancing strategy).
-
Replication and caching layers (write/read paths, coherence, TTLs, invalidation strategies).
-
Consistency models and CAP/PACELC trade-offs (client-visible guarantees and tunable options).
-
Failure detection, fault isolation, leader election, recovery and repair.
-
Hotspot mitigation, backpressure, and rate limiting (per-tenant fairness and overload control).
-
Capacity planning, SLAs/SLOs, and observability (metrics, tracing, alerting; how to validate the design).
Be explicit about assumptions, call out pitfalls/edge cases, and use small numeric examples where helpful.