PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Snowflake

Design a disk-backed KV store under contention

Last updated: Jun 18, 2026

Quick Overview

This question evaluates systems-design and concurrency engineering skills, specifically the ability to reason about durable on-disk storage, indexing and caching for large values, and contention-tolerant concurrency control for hot keys.

  • easy
  • Snowflake
  • System Design
  • Software Engineer

Design a disk-backed KV store under contention

Company: Snowflake

Role: Software Engineer

Category: System Design

Difficulty: easy

Interview Round: Onsite

## Design a Disk-Backed Key–Value Store Under Contention Design a single-node **key–value store** with the following two defining characteristics: - **Large values.** Values can be very large (think hundreds of KB to multiple MB each), so the value data must live primarily on **disk** rather than in RAM. - **High contention.** Many threads/clients call the store concurrently, and the access pattern includes **hot keys** (a small number of keys receiving a disproportionate share of reads and writes). The interviewer cares most about *how you keep this design correct and fast under contention*, including how you would concretely implement `acquire`/`release` of locks (or argue for an alternative). Your design should specify and justify: - **APIs:** `Put(key, value)`, `Get(key)`, `Delete(key)`, and optionally `CompareAndSet(key, expectedVersion, value)` (CAS). - **Durability guarantees:** precisely what is guaranteed once `Put` returns success. - **Correctness under concurrency:** what a client may observe when many threads operate on the same key simultaneously. ### Constraints & Assumptions - **Single node** to start; the design should sketch how it would scale out, but the core problem is local concurrency, not distributed consensus. - **Per-key linearizability** is the target consistency model (a `Get` reflects the most recently acknowledged `Put`/`Delete` for that key) unless you explicitly argue for something weaker. - **Durability:** once `Put` returns success, the value must survive a process crash (and a power loss, if `fsync` is honored). - **Workload skew:** assume reads outnumber writes, but a handful of keys are extremely hot for both. Value sizes range from small to multi-MB. - You control the storage format on local disk (SSD); you may assume POSIX file APIs (`pread`, `pwrite`, `fsync`). ### The Problem Walk through the full design: on-disk layout and indexing, caching, the read and write paths (including durability), and—most importantly—the concurrency control that keeps it correct and contention-tolerant. ```hint Where to start Values are large; loading them fully into memory to look up a key would be wasteful. Think about what the *minimal* in-memory structure is that tells you *where* on disk to find a given key's value — and whether that structure and the raw value bytes need the same concurrency protection. ``` ```hint Durability When `Put` returns success, the write must survive a crash. Think about what you need to persist *before* acknowledging, and what the cost of that persistence step is at high write throughput. Is there a way to amortize that cost across concurrent writers? ``` ```hint Attacking contention A single lock around all shared state is the obvious starting point — but it limits you to one operation at a time system-wide. What properties of the keyspace could you exploit to let unrelated keys proceed independently? And once you've done that, what additional techniques address the case where *related* operations on the same key still contend? ``` ```hint Acquire/release sketch Be ready to actually write `acquire(key)` / `release(key)`. Think carefully about which operations must be **atomic with respect to each other** — specifically, what could go wrong if two concurrent writers each read the current state of a key and then independently try to update it? What is the minimum the critical section must include to prevent that? ``` ### Clarifying Questions to Ask - Single node only, or do I need to design for sharding across machines and replication? - What consistency does the caller expect—per-key linearizability, or is eventual consistency acceptable for higher throughput? - On `Put` success, must the data survive power loss (real `fsync`), or just a process crash? - What is the read/write ratio, the value-size distribution, and how skewed is the hot-key access? - Is there a maximum value size, or must I support arbitrarily large blobs (streaming)? - Do I need range scans / ordered iteration, or is point access (get by exact key) sufficient? ### What a Strong Answer Covers - A clear separation of **in-memory index** from **on-disk value storage**, with a concrete record format (length-prefixed, checksummed, versioned, tombstones for deletes). - A **durability story** that names WAL + an explicit `fsync` policy and explains the crash-recovery path (replay WAL / rebuild index). - The **read path** using positional reads (`pread`) plus a layered cache strategy (OS page cache + an application cache for hot values). - A **contention plan** that goes beyond one lock: sharding, lock striping, reader-writer locks, optimistic/versioned CAS, and a specific mechanism for the single-hot-key case. - A correct **acquire/release locking** sketch with the critical section drawn precisely, and a justified choice of whether the disk append sits inside or outside the lock. - **Compaction/GC** for reclaiming space from overwritten values without blocking readers (epoch/RCU or refcounting). - Honest **trade-offs** (latency vs. throughput, simplicity vs. concurrency) and a sketch of how to scale out. ### Follow-up Questions - A single key is being hammered by thousands of concurrent writers. Striping doesn't help because they all hash to the same stripe—how do you keep that key's throughput high without losing per-key linearizability? - The background compactor is rewriting a segment while a reader holds an offset into it. How do you guarantee the reader never reads freed or relocated bytes? - You crash *after* the WAL `fsync` but *before* the value segment append and index update. Walk through exactly how recovery reconstructs correct state. - How would you support a value larger than RAM (e.g., a 5 GB blob) on both the write and read paths?

Quick Answer: This question evaluates systems-design and concurrency engineering skills, specifically the ability to reason about durable on-disk storage, indexing and caching for large values, and contention-tolerant concurrency control for hot keys.

Related Interview Questions

  • Design a Cron Job Scheduler - Snowflake (medium)
  • Design a REST API Abstraction Layer - Snowflake (hard)
  • Design an ACL authorization checking service - Snowflake (hard)
  • Design an object store with deduplication - Snowflake (medium)
  • Design a concurrent web crawler - Snowflake (hard)
Snowflake logo
Snowflake
Mar 1, 2026, 12:00 AM
Software Engineer
Onsite
System Design
51
0
Loading...

Design a Disk-Backed Key–Value Store Under Contention

Design a single-node key–value store with the following two defining characteristics:

  • Large values. Values can be very large (think hundreds of KB to multiple MB each), so the value data must live primarily on disk rather than in RAM.
  • High contention. Many threads/clients call the store concurrently, and the access pattern includes hot keys (a small number of keys receiving a disproportionate share of reads and writes). The interviewer cares most about how you keep this design correct and fast under contention , including how you would concretely implement acquire / release of locks (or argue for an alternative).

Your design should specify and justify:

  • APIs: Put(key, value) , Get(key) , Delete(key) , and optionally CompareAndSet(key, expectedVersion, value) (CAS).
  • Durability guarantees: precisely what is guaranteed once Put returns success.
  • Correctness under concurrency: what a client may observe when many threads operate on the same key simultaneously.

Constraints & Assumptions

  • Single node to start; the design should sketch how it would scale out, but the core problem is local concurrency, not distributed consensus.
  • Per-key linearizability is the target consistency model (a Get reflects the most recently acknowledged Put / Delete for that key) unless you explicitly argue for something weaker.
  • Durability: once Put returns success, the value must survive a process crash (and a power loss, if fsync is honored).
  • Workload skew: assume reads outnumber writes, but a handful of keys are extremely hot for both. Value sizes range from small to multi-MB.
  • You control the storage format on local disk (SSD); you may assume POSIX file APIs ( pread , pwrite , fsync ).

The Problem

Walk through the full design: on-disk layout and indexing, caching, the read and write paths (including durability), and—most importantly—the concurrency control that keeps it correct and contention-tolerant.

Clarifying Questions to Ask

  • Single node only, or do I need to design for sharding across machines and replication?
  • What consistency does the caller expect—per-key linearizability, or is eventual consistency acceptable for higher throughput?
  • On Put success, must the data survive power loss (real fsync ), or just a process crash?
  • What is the read/write ratio, the value-size distribution, and how skewed is the hot-key access?
  • Is there a maximum value size, or must I support arbitrarily large blobs (streaming)?
  • Do I need range scans / ordered iteration, or is point access (get by exact key) sufficient?

What a Strong Answer Covers

  • A clear separation of in-memory index from on-disk value storage , with a concrete record format (length-prefixed, checksummed, versioned, tombstones for deletes).
  • A durability story that names WAL + an explicit fsync policy and explains the crash-recovery path (replay WAL / rebuild index).
  • The read path using positional reads ( pread ) plus a layered cache strategy (OS page cache + an application cache for hot values).
  • A contention plan that goes beyond one lock: sharding, lock striping, reader-writer locks, optimistic/versioned CAS, and a specific mechanism for the single-hot-key case.
  • A correct acquire/release locking sketch with the critical section drawn precisely, and a justified choice of whether the disk append sits inside or outside the lock.
  • Compaction/GC for reclaiming space from overwritten values without blocking readers (epoch/RCU or refcounting).
  • Honest trade-offs (latency vs. throughput, simplicity vs. concurrency) and a sketch of how to scale out.

Follow-up Questions

  • A single key is being hammered by thousands of concurrent writers. Striping doesn't help because they all hash to the same stripe—how do you keep that key's throughput high without losing per-key linearizability?
  • The background compactor is rewriting a segment while a reader holds an offset into it. How do you guarantee the reader never reads freed or relocated bytes?
  • You crash after the WAL fsync but before the value segment append and index update. Walk through exactly how recovery reconstructs correct state.
  • How would you support a value larger than RAM (e.g., a 5 GB blob) on both the write and read paths?

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Snowflake•More Software Engineer•Snowflake Software Engineer•Snowflake System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.