PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/System Design/Microsoft

Design a Distributed Key-Value Store

Last updated: Jul 1, 2026

Quick Overview

This system design question evaluates the ability to architect a horizontally scalable distributed key-value store, covering data partitioning, replication, and failure handling. It tests conceptual understanding of consistent hashing, quorum-based consistency, and rebalancing, skills commonly probed in system design interviews to assess architectural reasoning about scalability and fault tolerance.

  • medium
  • Microsoft
  • System Design
  • Software Engineer

Design a Distributed Key-Value Store

Company: Microsoft

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

Design a horizontally scalable distributed key-value store. The system stores opaque values keyed by a string key and supports `get`, `put`, and `delete` on a single key. It must scale across many commodity nodes, stay available when individual nodes fail, and redistribute data smoothly as nodes are added or removed. Focus your design on data partitioning (sharding), replication, rebalancing, and failure handling. ### Constraints & Assumptions - Billions of keys; values range from a few bytes up to a few megabytes. - Core operations are single-key `get`, `put`, and `delete`; range scans are out of scope unless asked. - Target low single-digit-millisecond p99 latency for single-key operations and high availability (think 99.9%+). - Runs on many commodity nodes spread across racks; multi-datacenter is an extension, not the baseline. - Workload is read-heavy but with a meaningful write rate; data must be durable. ### Clarifying Questions to Ask - What consistency model is required: strong consistency, or eventual consistency with read-your-writes? How much staleness is tolerable? - What is the value-size distribution, and are accesses strictly point lookups or do we need range queries? - What durability and replication guarantees are required? Single datacenter or multi-DC? - What are the throughput and latency targets, and the read/write ratio? - Do we need multi-key atomic operations or transactions, or is single-key atomicity enough? ### Part 1 — Partitioning How do you map billions of keys across many nodes so that adding or removing a node moves as little data as possible, and how does a client find the node responsible for a key? ```hint Avoid full remaps Plain `hash(key) % N` remaps almost everything when `N` changes. Use a scheme where membership changes only affect a small slice of the keyspace — consistent hashing. ``` ```hint Smooth out the ring A single hash position per node creates uneven load and lumpy data movement. Give each physical node many positions on the ring (virtual nodes) so load and rebalancing are evenly distributed. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 2 — Replication and consistency Each key is stored on more than one node for durability and availability. How do you replicate, keep replicas consistent, and resolve concurrent or conflicting writes? ```hint Quorums With `N` replicas, require `R` nodes to confirm a read and `W` to confirm a write. Choosing `R + W > N` guarantees a read overlaps the latest write. Tuning `R` and `W` trades latency against consistency. ``` ```hint Reconciling concurrent writes Decide up front how two concurrent writes to the same key are reconciled: last-writer-wins by timestamp (simple, can lose updates) or version vectors / vector clocks that detect conflicts for the application to merge. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 3 — Rebalancing and failure handling What happens when a node is added, removed, or fails? Cover both transient and permanent failures, and how the cluster heals itself. ```hint Only the neighbors move On the hash ring, adding or removing a node should only shift data between that node and its ring neighbors — not the whole cluster. Quantify which ranges move. ``` ```hint Transient vs permanent A node that is briefly unreachable is different from one that is gone for good. Use hinted handoff to absorb temporary failures, and a background anti-entropy process to repair permanent ones. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### What a Strong Answer Covers ```premium-lock What a Strong Answer Covers ``` ### Follow-up Questions - Walk through a `put` with `N = 3`, `W = 2` while one of the three replicas is down. What happens to the write, and what will a later `R = 2` read observe? - How do you handle a hotspot where a single key (or a small range) receives a disproportionate share of traffic? - How would you add multi-key transactions or atomic batches, and what does that cost in latency and availability? - How would you treat a 1 MB value differently from a 10-byte value to keep latency predictable?

Quick Answer: This system design question evaluates the ability to architect a horizontally scalable distributed key-value store, covering data partitioning, replication, and failure handling. It tests conceptual understanding of consistent hashing, quorum-based consistency, and rebalancing, skills commonly probed in system design interviews to assess architectural reasoning about scalability and fault tolerance.

Related Interview Questions

  • Design a URL Shortener (High-Level and Low-Level Design) - Microsoft (medium)
  • Externally Sort a 500 GB CSV by One Column with 16 GB of RAM - Microsoft (medium)
  • Design a Metrics Ingestion Pipeline - Microsoft (medium)
  • Design a To-Do List Service (CRUD, Auth, Rate Limiting, Caching & API Versioning) - Microsoft (medium)
  • Design A Scalable Web Crawler - Microsoft (medium)
|Home/System Design/Microsoft

Design a Distributed Key-Value Store

Microsoft logo
Microsoft
Jun 11, 2026, 12:00 AM
mediumSoftware EngineerOnsiteSystem Design
0
0

Design a horizontally scalable distributed key-value store. The system stores opaque values keyed by a string key and supports get, put, and delete on a single key. It must scale across many commodity nodes, stay available when individual nodes fail, and redistribute data smoothly as nodes are added or removed. Focus your design on data partitioning (sharding), replication, rebalancing, and failure handling.

Constraints & Assumptions

  • Billions of keys; values range from a few bytes up to a few megabytes.
  • Core operations are single-key get , put , and delete ; range scans are out of scope unless asked.
  • Target low single-digit-millisecond p99 latency for single-key operations and high availability (think 99.9%+).
  • Runs on many commodity nodes spread across racks; multi-datacenter is an extension, not the baseline.
  • Workload is read-heavy but with a meaningful write rate; data must be durable.

Clarifying Questions to Ask

  • What consistency model is required: strong consistency, or eventual consistency with read-your-writes? How much staleness is tolerable?
  • What is the value-size distribution, and are accesses strictly point lookups or do we need range queries?
  • What durability and replication guarantees are required? Single datacenter or multi-DC?
  • What are the throughput and latency targets, and the read/write ratio?
  • Do we need multi-key atomic operations or transactions, or is single-key atomicity enough?

Part 1 — Partitioning

How do you map billions of keys across many nodes so that adding or removing a node moves as little data as possible, and how does a client find the node responsible for a key?

What This Part Should Cover Premium

Part 2 — Replication and consistency

Each key is stored on more than one node for durability and availability. How do you replicate, keep replicas consistent, and resolve concurrent or conflicting writes?

What This Part Should Cover Premium

Part 3 — Rebalancing and failure handling

What happens when a node is added, removed, or fails? Cover both transient and permanent failures, and how the cluster heals itself.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

  • Walk through a put with N = 3 , W = 2 while one of the three replicas is down. What happens to the write, and what will a later R = 2 read observe?
  • How do you handle a hotspot where a single key (or a small range) receives a disproportionate share of traffic?
  • How would you add multi-key transactions or atomic batches, and what does that cost in latency and availability?
  • How would you treat a 1 MB value differently from a 10-byte value to keep latency predictable?

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Microsoft•More Software Engineer•Microsoft Software Engineer•Microsoft System Design•Software Engineer System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.