System Design: Scalable Key–Value Store with Range Scans
You are asked to design a distributed key–value (KV) store that supports the following operations at scale:
-
get(key)
-
put(key, value)
-
delete(key)
-
range_scan(start_key, end_key, limit)
Assume billions of keys, multi-terabyte datasets, horizontal scalability, and high availability across multiple availability zones. Low p99 latency is desired for point lookups and sequential range scans.
Discuss and justify design choices for the following topics:
-
Data model and API semantics (keys, values, ordering, TTLs, versioning, deletes/tombstones, range scans)
-
Partitioning scheme and consistent hashing (how to scale out, balance load, and support efficient range scans; splitting/merging; hot-spot mitigation)
-
Replication (topology, placement, quorum vs consensus, replica count, cross-AZ)
-
Write path: LSM-tree versus B-Tree (trade-offs and your choice)
-
Compaction strategy (policies, amplification, tombstones, TTL expiry, backpressure)
-
Read path and caching (Bloom filters, block/row cache, readahead/prefetch for scans)
-
Consistency levels (point reads/writes and range scans; options and guarantees)
-
Failure handling (node/zone/network failures, recovery, rebalancing, data integrity)
Provide a high-level architecture and call out key trade-offs, pitfalls, and how you will validate the design under load.