Design: Scale a Single-Node LRU Cache to a Distributed Cache
Assume you are upgrading a single-node, in-memory LRU cache to a distributed cache to support high read throughput (10^5–10^6 QPS), moderate writes, millions of keys, and sub-millisecond p50 latency within a single region. Keys and values are arbitrary byte strings; items have TTLs and are evicted by LRU when memory is full. Availability target is 99.9%+.
Design the system and address the following:
-
Sharding
-
Describe how to shard keys across nodes using consistent hashing with virtual nodes (vnodes). Include how you map a key to its primary node and to replicas, and how you handle weighted nodes.
-
Rebalancing
-
Explain what happens during node joins and leaves (graceful and failures). Include data movement, ownership changes, and how to rate-limit rebalancing to protect tail latency.
-
Hot Keys
-
Propose techniques to mitigate hot keys and stampedes (e.g., heavy hitters, thundering herds).
-
Replication
-
Specify replication factor, placement, write propagation (sync/async), conflict resolution/versioning, and replica read preferences.
-
Failure Detection
-
Describe how nodes detect membership changes (e.g., gossip, heartbeats, thresholds) and how that integrates with routing.
-
Request Routing
-
Compare client-side routing vs proxy/sidecar. Explain how a client locates the right node(s) and handles failures/blacklisting.
-
Read/Write Paths
-
Provide read and write flows for: cache hit, miss + fill, and write-through/write-back/write-around policies.
-
Consistency & Invalidation
-
Discuss TTL handling (soft/hard TTL), write-through/back, cache invalidations (push vs pull), and dogpile prevention.
-
Fault Tolerance & Partitions
-
Explain behavior under node failures and network partitions. State the consistency-availability trade-offs you choose and why.
-
Monitoring & Capacity Planning
-
List the key metrics, SLOs, and an approach to plan capacity (memory, network, QPS). Include simple sizing formulas.
-
Trade-offs & Performance
-
Summarize trade-offs among consistency, availability, latency, and cost. Provide expected latency/throughput numbers and rebalancing cost at scale.