System Design: Scale a Key–Value Store with Consistent Hashing
Context
You have a single-server key–value store that must scale horizontally across many servers while maintaining high availability and low latency. Assume point lookups/updates by key (no multi-key transactions), and tunable consistency is acceptable.
Tasks
-
Partition keys with consistent hashing, using virtual nodes (vnodes). Explain the hash ring, token assignment, and how clients/gateways route requests.
-
Describe node joins and leaves: how they trigger minimal data movement and how rebalancing works.
-
Define replication and quorum choices: pick N (replication factor), R (read quorum), W (write quorum), and justify trade-offs.
-
Specify read/write paths: coordinator behavior, conflict/version handling, and read repair.
-
Explain failure detection, re-replication, and recovery: membership, hinted handoff, anti-entropy, and rebuild.
-
Address hot keys, skew, and load balancing.
-
Ensure data migration safety during ownership changes; describe monitoring and rolling upgrades.
-
Compare alternatives (e.g., directory service, range partitioning) and justify trade-offs.
Deliver a diagram-free written design that covers the above with clear assumptions, pitfalls, and metrics/SLOs.