PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/DoorDash

Scale the cache to a distributed system

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to design and scale a distributed caching system, testing competencies in sharding and consistent hashing, replication and consistency models, request routing, failure detection, cache invalidation, capacity planning, and operational trade-offs for high-throughput, low-latency workloads.

  • hard
  • DoorDash
  • System Design
  • Software Engineer

Scale the cache to a distributed system

Company: DoorDash

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

Scale your LRU cache to a distributed cache. Describe how you would shard keys across nodes using consistent hashing with virtual nodes, how rebalancing works during node joins/leaves, and how you would mitigate hot keys. Specify replication strategy, failure detection, request routing, and read/write paths. Discuss cache consistency (TTL, write-through/back, invalidation), fault tolerance under partitions, and monitoring/capacity planning. Provide trade-offs and expected performance.

Quick Answer: This question evaluates a candidate's ability to design and scale a distributed caching system, testing competencies in sharding and consistent hashing, replication and consistency models, request routing, failure detection, cache invalidation, capacity planning, and operational trade-offs for high-throughput, low-latency workloads.

Related Interview Questions

  • Design a Food Rating System - DoorDash (medium)
  • Design a resilient bootstrap API - DoorDash (medium)
  • Design Real-Time Driver Pay Aggregation - DoorDash (hard)
  • Design Food Ratings and Driver Payouts - DoorDash (medium)
  • Design personalized restaurant search and recommendations - DoorDash (medium)
DoorDash logo
DoorDash
Sep 6, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
5
0

Design: Scale a Single-Node LRU Cache to a Distributed Cache

Assume you are upgrading a single-node, in-memory LRU cache to a distributed cache to support high read throughput (10^5–10^6 QPS), moderate writes, millions of keys, and sub-millisecond p50 latency within a single region. Keys and values are arbitrary byte strings; items have TTLs and are evicted by LRU when memory is full. Availability target is 99.9%+.

Design the system and address the following:

  1. Sharding
  • Describe how to shard keys across nodes using consistent hashing with virtual nodes (vnodes). Include how you map a key to its primary node and to replicas, and how you handle weighted nodes.
  1. Rebalancing
  • Explain what happens during node joins and leaves (graceful and failures). Include data movement, ownership changes, and how to rate-limit rebalancing to protect tail latency.
  1. Hot Keys
  • Propose techniques to mitigate hot keys and stampedes (e.g., heavy hitters, thundering herds).
  1. Replication
  • Specify replication factor, placement, write propagation (sync/async), conflict resolution/versioning, and replica read preferences.
  1. Failure Detection
  • Describe how nodes detect membership changes (e.g., gossip, heartbeats, thresholds) and how that integrates with routing.
  1. Request Routing
  • Compare client-side routing vs proxy/sidecar. Explain how a client locates the right node(s) and handles failures/blacklisting.
  1. Read/Write Paths
  • Provide read and write flows for: cache hit, miss + fill, and write-through/write-back/write-around policies.
  1. Consistency & Invalidation
  • Discuss TTL handling (soft/hard TTL), write-through/back, cache invalidations (push vs pull), and dogpile prevention.
  1. Fault Tolerance & Partitions
  • Explain behavior under node failures and network partitions. State the consistency-availability trade-offs you choose and why.
  1. Monitoring & Capacity Planning
  • List the key metrics, SLOs, and an approach to plan capacity (memory, network, QPS). Include simple sizing formulas.
  1. Trade-offs & Performance
  • Summarize trade-offs among consistency, availability, latency, and cost. Provide expected latency/throughput numbers and rebalancing cost at scale.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More DoorDash•More Software Engineer•DoorDash Software Engineer•DoorDash System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.