Design a Durable Key-Value Cache with File-System Persistence
Company: xAI
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
Design and implement (at a discussion + light-code level — you will sketch the key functions but not run them) a **durable in-memory key-value cache**.
The cache lives in a single process on a single machine and exposes two operations:
- `put(key, value)` — store or overwrite the value for a key.
- `get(key)` — return the current value for a key (or a miss).
**Durable** means: if the cache process crashes or is restarted, a fresh instance must be able to **recover the cached data from persistent storage** (the local file system) during initialization, so that no acknowledged `put` is lost.
### Constraints & Assumptions
- Single node, single process; the local file system is the only persistent storage available (no external database).
- Reads must stay fast — `get` should be served from memory, not from disk.
- A `put` is only considered successful once it will survive a process crash.
- The keyspace is small enough that all keys (and, initially, all values) fit in memory.
- Crash model: the process can die at any instant, including mid-write; the disk itself is assumed healthy.
- Interviewer expectation: sketch the data structures and the `put` / `get` / `init` (recovery) code paths in simple code or pseudocode, then reason about trade-offs out loud.
### Clarifying Questions to Ask
- What is the durability requirement per write — must every single `put` survive a crash, or is a small window of loss acceptable?
- What are the expected sizes: number of distinct keys, typical value size, and total data volume?
- What is the read/write ratio, and how frequent are overwrites of the same key?
- Is the cache accessed by a single thread, or must `put`/`get` be safe under concurrency?
- How fast must recovery (restart) be — is a full scan of the persisted data on startup acceptable?
- Do we need delete/expiry (TTL) semantics, or only `get`/`put`?
### Part 1
Design the core cache and its persistence strategy. Describe the in-memory data structure, what exactly gets written to disk on each `put`, and how `init()` reconstructs the cache after a crash. Sketch the code for `put`, `get`, and the recovery path.
```hint Persistence pattern
Think **write-through**: apply the write to the in-memory map and *also* persist it to the file system before acknowledging. The disk copy is the source of truth for recovery; memory is just the fast serving layer.
```
```hint On-disk layout
An **append-only log** of `(key, value)` records is far friendlier to crash-safety and write latency than rewriting a data file in place. Recovery is then "replay the log from the beginning; later records win."
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 2
Analyze the trade-offs of your design. In particular: what does write-through cost you on the write path, and what happens to the persisted data over time when keys are overwritten many times — how do you handle a large accumulation of **stale key-value records**?
```hint Latency vs. durability
Every durable `put` pays disk I/O — and *true* durability requires flushing (`fsync`), not just a buffered write. Consider what you can offer if the caller accepts group/periodic flushing instead of per-write flushing.
```
```hint Stale data
In an append-only log, an overwritten key's old records are dead weight: they inflate the file and slow recovery. The classic remedy is **compaction** — periodically rewrite a new log containing only the latest record per key, then atomically swap it in.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 3
Now suppose the number of keys is small, but each value is **very large** (e.g., many megabytes). The naive design rewrites or re-appends the whole value on every `put`, and recovery replays huge amounts of data. How would you optimize the design for this workload?
```hint Per-key isolation
With few keys and huge values, a **file per key** stops one key's churn from bloating a shared log — and makes "find the latest value of key K" a single-file problem instead of a full-log scan.
```
```hint Identifying the freshest version
If each write of a key appends (or writes a new version file) tagged with a **timestamp or monotonically increasing version number**, then both `get`-after-recovery and cleanup reduce to "take the entry with the highest version; everything older is stale."
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- How would you make `put` and `get` safe under concurrent access from multiple threads, and what does that do to your write-path latency?
- If a machine crash can corrupt the tail of a file mid-write, how does your recovery distinguish a torn final record from valid data?
- When and how would you trigger compaction (size threshold, stale ratio, background thread), and how do you serve reads and writes *during* compaction?
- If this cache had to survive whole-machine loss rather than just process crashes, what would you change (replication, remote storage), and what new consistency questions appear?
Quick Answer: This question evaluates a candidate's ability to design durable in-memory data structures and persistence strategies, including reasoning about crash recovery, durability guarantees, and performance trade-offs.