How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a medium difficulty System Design question, commonly asked during Onsite rounds at Microsoft.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Microsoft during technical interviews.

Design a Distributed Key-Value Store | Microsoft Interview Question

Q: Design a Distributed Key-Value Store

This system design question evaluates the ability to architect a horizontally scalable distributed key-value store, covering data partitioning, replication, and failure handling. It tests conceptual understanding of consistent hashing, quorum-based consistency, and rebalancing, skills commonly probed in system design interviews to assess architectural reasoning about scalability and fault tolerance.

Design a horizontally scalable distributed key-value store. The system stores opaque values keyed by a string key and supports get, put, and delete on a single key. It must scale across many commodity nodes, stay available when individual nodes fail, and redistribute data smoothly as nodes are added or removed. Focus your design on data partitioning (sharding), replication, rebalancing, and failure handling.

Constraints & Assumptions

Billions of keys; values range from a few bytes up to a few megabytes.
Core operations are single-key get , put , and delete ; range scans are out of scope unless asked.
Target low single-digit-millisecond p99 latency for single-key operations and high availability (think 99.9%+).
Runs on many commodity nodes spread across racks; multi-datacenter is an extension, not the baseline.
Workload is read-heavy but with a meaningful write rate; data must be durable.

Clarifying Questions to Ask

What consistency model is required: strong consistency, or eventual consistency with read-your-writes? How much staleness is tolerable?
What is the value-size distribution, and are accesses strictly point lookups or do we need range queries?
What durability and replication guarantees are required? Single datacenter or multi-DC?
What are the throughput and latency targets, and the read/write ratio?
Do we need multi-key atomic operations or transactions, or is single-key atomicity enough?

Part 1 — Partitioning

How do you map billions of keys across many nodes so that adding or removing a node moves as little data as possible, and how does a client find the node responsible for a key?

What This Part Should Cover Premium

Part 2 — Replication and consistency

Each key is stored on more than one node for durability and availability. How do you replicate, keep replicas consistent, and resolve concurrent or conflicting writes?

What This Part Should Cover Premium

Part 3 — Rebalancing and failure handling

What happens when a node is added, removed, or fails? Cover both transient and permanent failures, and how the cluster heals itself.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Walk through a put with N = 3 , W = 2 while one of the three replicas is down. What happens to the write, and what will a later R = 2 read observe?
How do you handle a hotspot where a single key (or a small range) receives a disproportionate share of traffic?
How would you add multi-key transactions or atomic batches, and what does that cost in latency and availability?
How would you treat a 1 MB value differently from a 10-byte value to keep latency predictable?

Constraints & Assumptions

Billions of keys; values range from a few bytes up to a few megabytes.
Core operations are single-key get , put , and delete ; range scans are out of scope unless asked.
Target low single-digit-millisecond p99 latency for single-key operations and high availability (think 99.9%+).
Runs on many commodity nodes spread across racks; multi-datacenter is an extension, not the baseline.
Workload is read-heavy but with a meaningful write rate; data must be durable.

Clarifying Questions to Ask

What consistency model is required: strong consistency, or eventual consistency with read-your-writes? How much staleness is tolerable?
What is the value-size distribution, and are accesses strictly point lookups or do we need range queries?
What durability and replication guarantees are required? Single datacenter or multi-DC?
What are the throughput and latency targets, and the read/write ratio?
Do we need multi-key atomic operations or transactions, or is single-key atomicity enough?

Part 1 — Partitioning

How do you map billions of keys across many nodes so that adding or removing a node moves as little data as possible, and how does a client find the node responsible for a key?

What This Part Should Cover Premium

Part 2 — Replication and consistency

Each key is stored on more than one node for durability and availability. How do you replicate, keep replicas consistent, and resolve concurrent or conflicting writes?

What This Part Should Cover Premium

Part 3 — Rebalancing and failure handling

What happens when a node is added, removed, or fails? Cover both transient and permanent failures, and how the cluster heals itself.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Walk through a put with N = 3 , W = 2 while one of the three replicas is down. What happens to the write, and what will a later R = 2 read observe?
How do you handle a hotspot where a single key (or a small range) receives a disproportionate share of traffic?
How would you add multi-key transactions or atomic batches, and what does that cost in latency and availability?
How would you treat a 1 MB value differently from a 10-byte value to keep latency predictable?

Design a Distributed Key-Value Store

Quick Overview

Design a Distributed Key-Value Store

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Partitioning

What This Part Should Cover Premium

Part 2 — Replication and consistency

What This Part Should Cover Premium

Part 3 — Rebalancing and failure handling

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Submit Your Answer to Earn 20XP

Design a Distributed Key-Value Store

Quick Overview

Design a Distributed Key-Value Store

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Partitioning

What This Part Should Cover Premium

Part 2 — Replication and consistency

What This Part Should Cover Premium

Part 3 — Rebalancing and failure handling

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Submit Your Answer to Earn 20XP