PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/System Design/Databricks

Design a key-value store

Last updated: Jun 27, 2026

Quick Overview

This question evaluates understanding of durable storage and crash-consistency, including on-disk data layout, filesystem durability semantics, concurrency control, and performance trade-offs for single-node embeddable key–value engines.

  • hard
  • Databricks
  • System Design
  • Software Engineer

Design a key-value store

Company: Databricks

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Onsite

Design a durable key-value store that supports put(key, value), get(key), and delete(key). Specify how you will achieve durability with a write-ahead log (WAL), lay out on-disk data structures, and perform crash recovery. Describe compaction/garbage collection, fsync/batching strategy, and handling of torn writes. Explain your concurrency model: single-writer/multi-reader vs. multi-writer, lock-based vs. optimistic/lock-free approaches, and how you ensure correctness under concurrent access. Analyze consistency guarantees (e.g., read-after-write), latency/throughput trade-offs, and back-of-the-envelope resource estimates. Provide a minimal testing plan covering failure injection and recovery.

Quick Answer: This question evaluates understanding of durable storage and crash-consistency, including on-disk data layout, filesystem durability semantics, concurrency control, and performance trade-offs for single-node embeddable key–value engines.

Related Interview Questions

  • Design a Slack-Like Messaging System - Databricks (medium)
  • Design a Book Price Aggregator - Databricks (medium)
  • Design a Distributed File System - Databricks (medium)
  • Design a stock order manager - Databricks (medium)
  • Design an Online Bookstore - Databricks (hard)
|Home/System Design/Databricks

Design a key-value store

Databricks logo
Databricks
Aug 11, 2025, 12:00 AM
hardSoftware EngineerOnsiteSystem Design
126
0

System Design: Durable Key–Value Store

Design a single-node, embeddable key–value store library with a simple API that must remain correct and durable across process crashes and power failures.

This is a single, cohesive system-design problem. Treat the numbered items under What to Cover as the areas your design must address, not as independent sub-problems — they all describe one storage engine.

Environment & Assumptions

  • A POSIX-like filesystem on SSD/NVMe with 4 KiB sectors .
  • Single process, embedded as a library (no network/RPC layer to design).

API

put(key, value)
get(key)
delete(key)
  • Keys and values are byte arrays .
  • No secondary indexes and no transactions beyond single-key operations.

Clarifying Questions to Ask

Before designing, scope the problem with the interviewer:

  1. Durability contract — must every acknowledged put survive a power loss (synchronous fsync-before-ack), or is a small, bounded loss window acceptable for higher throughput?
  2. Workload shape — is this write-heavy, read-heavy, or mixed? What is the read:write ratio, and is the access pattern point lookups only or do we anticipate range scans later?
  3. Data sizes — typical and maximum key/value sizes, total dataset size, and whether values can exceed available RAM (forcing on-disk structures vs. an in-memory map).
  4. Concurrency expectations — single application thread, or multiple threads issuing operations concurrently? Is multi-writer required, or is single-writer/multi-reader sufficient?
  5. Consistency requirement — is read-after-write (your own writes visible immediately) enough, or is full linearizability across threads expected?
  6. Operational limits — target throughput/latency SLOs, available disk headroom for compaction, and whether crashes/restarts are frequent enough that recovery time matters.

Constraints & Assumptions

Anchor the design with concrete numbers (state these as your assumptions if the interviewer leaves them open):

  • Storage: SSD/NVMe, 4 KiB sector size; sequential writes are far cheaper than random writes.
  • Scale to size for: order of 10710^7107 – 10810^8108 keys, ~16 B keys, ~1 KB values, dataset larger than RAM.
  • Throughput target: on the order of tens of thousands of durable put /s (e.g. ~50k/s) on a single node.
  • Durability: an acknowledged write in SYNC mode must survive process crash and power failure; any relaxed mode must make its loss window explicit and bounded.
  • Single node, single process: no replication, no network layer, no distributed consensus in scope.

What to Cover

Walk through your design across the following areas. Hints sit under the areas where most candidates get stuck — open them only if you want a nudge.

  1. Durability via a write-ahead log (WAL) — record/segment format, rotation, and fsync policy.
  1. On-disk data structures and layout — how data files and metadata are organized.
  1. Crash recovery — how you detect and recover from incomplete operations.
  1. Compaction / garbage collection — reclaiming stale versions and tombstones.
  1. fsync and batching strategy — the throughput vs. latency trade-offs.
  1. Torn writes and partial files — how you detect and handle them.
  1. Concurrency model — single-writer/multi-reader vs. multi-writer; lock-based vs. optimistic/lock-free; and how you ensure correctness under concurrent access.
  1. Consistency guarantees — e.g., read-after-write, and any linearizability options.
  1. Performance analysis — latency/throughput trade-offs and back-of-the-envelope resource estimates.
  1. Minimal testing plan — including failure injection and recovery validation.

What a Strong Answer Covers

The interviewer is checking for these dimensions (this is a checklist of signals, not the answers):

  • Durability discipline — a clear "nothing acked until durable" contract, the right fsync semantics, and an explicit (bounded) loss window if a relaxed mode is offered.
  • Crash-safety reasoning — a crash-point-by-crash-point argument that a consistent state is always recoverable, anchored on atomic file installs and a single authoritative metadata version.
  • Data-structure fit — a layout chosen to match SSD economics and the stated workload, with a justification (and awareness of the alternative, e.g. LSM vs. B+ tree, and when each wins).
  • Correctness under concurrency — a defensible read path and write path, explicit ordering/sequence-number scheme, and safe memory reclamation; honesty about lock-based vs. lock-free trade-offs.
  • Quantitative grounding — back-of-the-envelope throughput, write amplification, and memory estimates that identify the real bottleneck.
  • Testability — a failure-injection and recovery-validation plan, not just happy-path unit tests.
  • Trade-off communication — recommending a sensible default and naming the escape hatch, rather than over-engineering.

Follow-up Questions

Expect the interviewer to probe deeper after the main design:

  1. Scaling writes: "Your single-committer design is fsync-bound. How would you scale concurrent writers — and why might sharding the keyspace beat a lock-free multi-writer inside one engine?"
  2. What breaks first at 100×? "Ingest 100× higher: what saturates first — fsync rate, compaction write amplification, or block-cache misses — and how do you apply backpressure before L0 explodes?"
  3. Range scans: "The API is point-only today. What changes if we must support ordered range scans? Walk through a merging iterator over memtable + on-disk files and what it costs."
  4. Adversarial durability: "Prove a deleted key can never resurrect after a crash mid-compaction. Which exact invariant guarantees it?"

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Databricks•More Software Engineer•Databricks Software Engineer•Databricks System Design•Software Engineer System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.