Design a key-value store

Q: Design a key-value store

This question evaluates understanding of durable storage and crash-consistency, including on-disk data layout, filesystem durability semantics, concurrency control, and performance trade-offs for single-node embeddable key–value engines.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Q: What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Onsite rounds at Databricks.

Q: What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Databricks during technical interviews.

Question

System Design: Durable Key–Value Store

Design a single-node, embeddable key–value store library with a simple API that must remain correct and durable across process crashes and power failures.

This is a single, cohesive system-design problem. Treat the numbered items under What to Cover as the areas your design must address, not as independent sub-problems — they all describe one storage engine.

Environment & Assumptions

A POSIX-like filesystem on SSD/NVMe with 4 KiB sectors .
Single process, embedded as a library (no network/RPC layer to design).

API

put(key, value)
get(key)
delete(key)

Keys and values are byte arrays .
No secondary indexes and no transactions beyond single-key operations.

Clarifying Questions to Ask

Before designing, scope the problem with the interviewer:

Durability contract — must every acknowledged put survive a power loss (synchronous fsync-before-ack), or is a small, bounded loss window acceptable for higher throughput?
Workload shape — is this write-heavy, read-heavy, or mixed? What is the read:write ratio, and is the access pattern point lookups only or do we anticipate range scans later?
Data sizes — typical and maximum key/value sizes, total dataset size, and whether values can exceed available RAM (forcing on-disk structures vs. an in-memory map).
Concurrency expectations — single application thread, or multiple threads issuing operations concurrently? Is multi-writer required, or is single-writer/multi-reader sufficient?
Consistency requirement — is read-after-write (your own writes visible immediately) enough, or is full linearizability across threads expected?
Operational limits — target throughput/latency SLOs, available disk headroom for compaction, and whether crashes/restarts are frequent enough that recovery time matters.

Constraints & Assumptions

Anchor the design with concrete numbers (state these as your assumptions if the interviewer leaves them open):

Storage: SSD/NVMe, 4 KiB sector size; sequential writes are far cheaper than random writes.
Scale to size for: order of $10^7$ – $10^8$ keys, ~16 B keys, ~1 KB values, dataset larger than RAM.
Throughput target: on the order of tens of thousands of durable put /s (e.g. ~50k/s) on a single node.
Durability: an acknowledged write in SYNC mode must survive process crash and power failure; any relaxed mode must make its loss window explicit and bounded.
Single node, single process: no replication, no network layer, no distributed consensus in scope.

What to Cover

Walk through your design across the following areas. Hints sit under the areas where most candidates get stuck — open them only if you want a nudge.

Durability via a write-ahead log (WAL) — record/segment format, rotation, and fsync policy.

On-disk data structures and layout — how data files and metadata are organized.

Crash recovery — how you detect and recover from incomplete operations.

Compaction / garbage collection — reclaiming stale versions and tombstones.

fsync and batching strategy — the throughput vs. latency trade-offs.

Torn writes and partial files — how you detect and handle them.

Concurrency model — single-writer/multi-reader vs. multi-writer; lock-based vs. optimistic/lock-free; and how you ensure correctness under concurrent access.

Consistency guarantees — e.g., read-after-write, and any linearizability options.

Performance analysis — latency/throughput trade-offs and back-of-the-envelope resource estimates.

Minimal testing plan — including failure injection and recovery validation.

What a Strong Answer Covers

The interviewer is checking for these dimensions (this is a checklist of signals, not the answers):

Durability discipline — a clear "nothing acked until durable" contract, the right fsync semantics, and an explicit (bounded) loss window if a relaxed mode is offered.
Crash-safety reasoning — a crash-point-by-crash-point argument that a consistent state is always recoverable, anchored on atomic file installs and a single authoritative metadata version.
Data-structure fit — a layout chosen to match SSD economics and the stated workload, with a justification (and awareness of the alternative, e.g. LSM vs. B+ tree, and when each wins).
Correctness under concurrency — a defensible read path and write path, explicit ordering/sequence-number scheme, and safe memory reclamation; honesty about lock-based vs. lock-free trade-offs.
Quantitative grounding — back-of-the-envelope throughput, write amplification, and memory estimates that identify the real bottleneck.
Testability — a failure-injection and recovery-validation plan, not just happy-path unit tests.
Trade-off communication — recommending a sensible default and naming the escape hatch, rather than over-engineering.

Follow-up Questions

Expect the interviewer to probe deeper after the main design:

Scaling writes: "Your single-committer design is fsync-bound. How would you scale concurrent writers — and why might sharding the keyspace beat a lock-free multi-writer inside one engine?"
What breaks first at 100×? "Ingest 100× higher: what saturates first — fsync rate, compaction write amplification, or block-cache misses — and how do you apply backpressure before L0 explodes?"
Range scans: "The API is point-only today. What changes if we must support ordered range scans? Walk through a merging iterator over memtable + on-disk files and what it costs."
Adversarial durability: "Prove a deleted key can never resurrect after a crash mid-compaction. Which exact invariant guarantees it?"

Design a key-value store

Quick Overview

Design a key-value store

System Design: Durable Key–Value Store

Environment & Assumptions

API

Clarifying Questions to Ask

Constraints & Assumptions

What to Cover

What a Strong Answer Covers

Follow-up Questions

Submit Your Answer to Earn 20XP

Design a key-value store

Quick Overview

Design a key-value store

System Design: Durable Key–Value Store

Environment & Assumptions

API

Clarifying Questions to Ask

Constraints & Assumptions

What to Cover

What a Strong Answer Covers

Follow-up Questions

Submit Your Answer to Earn 20XP