Design a distributed key-value store at scale

Q: Design a distributed key-value store at scale

This is a System Design interview question from Confluent for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

System Design Prompt: Globally Distributed, Read-Optimized Key-Value Store

Context and minimal assumptions

Design a globally distributed key-value (KV) store optimized for read-heavy workloads. Assume:

Workload: 90–95% reads, 5–10% writes; point lookups dominate, range scans are rare.
Data model: opaque values by key; keys up to ~1 KB; median value ~1 KB (max 1 MB).
Scale: tens of TB per region; multi-tenant; multi-region across continents.
Latency targets: in-region p99 read ≤ 5–10 ms; global p99 read ≤ 150 ms via nearest region.
Availability/durability: ≥ 99.99% availability; RPO 0 within a region; cross-region replication tolerated to be async by default.

Task

Propose an end-to-end design and justify your choices. Address the following areas with performance, complexity, and resource trade-offs:

OS-level performance considerations (threads vs async I/O, context switching, memory management, filesystem and kernel tuning).
Storage layout and indexing (on-disk format, compaction strategy, write amplification trade-offs).
Partitioning and sharding (key distribution, shard sizing, rebalancing strategy).
Replication and caching layers (write/read paths, coherence, TTLs, invalidation strategies).
Consistency models and CAP/PACELC trade-offs (client-visible guarantees and tunable options).
Failure detection, fault isolation, leader election, recovery and repair.
Hotspot mitigation, backpressure, and rate limiting (per-tenant fairness and overload control).
Capacity planning, SLAs/SLOs, and observability (metrics, tracing, alerting; how to validate the design).

Be explicit about assumptions, call out pitfalls/edge cases, and use small numeric examples where helpful.

Design a distributed key-value store at scale

System Design Prompt: Globally Distributed, Read-Optimized Key-Value Store

Context and minimal assumptions

Task

Solution (Locked)

Comments (0)