Design a Generic, Type-Safe Key-Value Store API
Context
You're asked to design a reusable key-value store with a generic, user-friendly API. The store should start as a simple in-memory implementation suitable for local use, but the design must be able to evolve into a persistent implementation without breaking callers.
Assume Java for the typed API so that generics and type safety can be discussed concretely, but keep the design decisions language-agnostic where possible. The interviewer cares less about distributed-systems scaling and more about the quality of your interface, your design reasoning (analyzing tradeoffs out loud), and your ability to produce high-quality, reusable code — including correct handling of concurrency.
Your design must address:
-
The
data model
and core operations:
put
,
get
,
delete
,
exists
, and an atomic
batch
.
-
How to
expose generics
in the interface (e.g., in Java) while maintaining
type safety
.
-
Reusability and layering
— e.g., pluggable serialization and pluggable storage engines.
-
An
error-handling
strategy.
-
Versioning and TTL
options (optimistic concurrency control, expirations).
-
How to
evolve an in-memory prototype into a persistent implementation
without changing the public API.
Constraints & Assumptions
-
Single node, single process.
Reads and writes should be linearizable
per key
within the process. Cross-node replication, sharding, and distributed transactions are out of scope unless explicitly raised as an extension.
-
Rough scale to design for (informs the in-memory footprint and engine choice, not a hard limit): on the order of
107
keys, average key
≈32
B, average value
≈256
B, with a single-node throughput goal of roughly
105
ops/s.
-
The storage engine treats keys and values as
opaque bytes
; typing lives only at the API edge.
-
Out of scope unless asked: range scans, secondary indexes, transactions spanning multiple store instances.
Clarifying Questions to Ask
A candidate should scope the problem up front before designing. Reasonable clarifications:
-
Is this a
single-process library
or a networked service? (Drives whether errors are local exceptions or remote/timeout failures, and whether the API is sync or async.)
-
Is one store instance fixed to
one key type and one value type
, or must a single instance hold heterogeneous types? (Drives the generic signature
KeyValueStore<K, V>
vs. a per-call type token.)
-
What durability/consistency guarantees are required — is "lose the last few writes on crash" acceptable, or must every write be durable before it returns?
-
Do we need
atomic multi-key
operations (batch), or is per-key atomicity enough?
-
Are TTL and optimistic-concurrency
required
features or
nice-to-have
extensions to design for?
-
What are the rejection rules — max key/value size, are
null
keys/values legal, are empty keys legal?
Part 1 — Data Model & Core Operations
Define the logical record stored per key and specify the precise semantics of put, get, delete, exists, and atomic batch. State exactly what each operation returns on the missing/expired/deleted cases, and what metadata the record must carry to later support versioning and TTL.
What This Part Should Cover
-
A clear per-record schema (value bytes, version, expiry, tombstone/delete marker, optional audit timestamp) and
why
each field exists.
-
Precise return semantics for each op, especially the missing / expired / just-deleted cases.
-
The choice to model absence as a return value (e.g.
Optional.empty()
) rather than an exception, with justification.
-
How
batch
is scoped — atomic (all-or-nothing) vs. best-effort — and why.
Part 2 — Generics & Type Safety
Show the Java interface signature and explain how you expose generics to callers while keeping the API type-safe. Crucially, explain what guarantees compile-time generics give you, what they do not give you at runtime, and how you close that gap.
What This Part Should Cover
-
The generic interface signature (
KeyValueStore<K, V>
) and how it removes casts from caller code.
-
A clear explanation of type erasure and why compile-time generics alone do not guarantee runtime type safety.
-
The mechanism that provides
runtime
type safety (e.g. a
Serializer<K>
/
Serializer<V>
bound at construction).
-
Correct handling of the "static factory inside a generic interface" problem (a method-generic factory, not a static referencing the interface's type parameters).
Part 3 — Reusability, Layering & Error Handling
Design the layering that makes this store reusable: how do you make the serialization and the storage engine pluggable behind one stable public API? Separately, describe your error-handling strategy — the exception/error model and how a caller knows whether a failure is worth retrying.
What This Part Should Cover
-
Explicit layers/interfaces (
Serializer<T>
, a
StorageEngine
abstraction) and how each is swapped independently.
-
How the public
KeyValueStore<K, V>
contract stays stable across codec/engine swaps.
-
An intent-revealing error hierarchy that separates retryable from terminal failures.
-
The decision to model absence and CAS-mismatch as ordinary return values, reserving exceptions for genuinely exceptional operational failures.
Part 4 — Versioning & TTL
Add optimistic concurrency control and time-to-live. Specify how a per-key version enables compare-and-set, the exact compareAndSet contract, and how TTL is stored and enforced so that callers never read an expired value. Then handle the concurrency this introduces.
What This Part Should Cover
-
A correct, monotonic per-key version and a precise
compareAndSet
contract (including the deleted-key / version-retention subtlety).
-
TTL stored as an absolute instant, enforced both lazily on read and eagerly by a sweeper, with the honest guarantee (exact on visibility, best-effort on eviction timing).
-
Identification of the read-modify-write race and a concrete fix (the version check and write under the same lock).
-
Lock-granularity reasoning: per-key striped locks over a global lock, and deterministic lock ordering to keep batches deadlock-free.
Part 5 — Evolving In-Memory → Persistent
Describe the in-memory prototype, then show how the same public API evolves to durable storage without changing the caller-facing contract. Define the StorageEngine boundary that makes this possible, and walk through a concrete persistent design's write path, read path, delete, recovery, and space reclamation.
What This Part Should Cover
-
A concrete in-memory prototype (e.g. a concurrent map + striped locks + an expiry structure + injected clock) with operation complexity.
-
A
StorageEngine
interface that preserves version/TTL/tombstone metadata across the boundary, so the public API is unchanged.
-
A coherent persistent design (e.g. WAL + LSM/memtable + SSTables + Bloom filters), including write path, read path, delete-as-tombstone, recovery, and compaction/space reclamation.
-
The durability knob exposed via write options, and the option to wrap an existing engine (e.g. RocksDB) behind the same interface.
What a Strong Answer Covers
Across all parts, the interviewer is weighing design judgment and code quality more than raw breadth. A strong answer:
-
Frames the problem as
API/library design
(interface ergonomics, layering, concurrency) rather than reaching for distributed-systems scaling, and states scope decisions explicitly.
-
Keeps the
public
KeyValueStore<K, V> contract stable
end-to-end — the same surface serves the in-memory prototype and the persistent engine, which is the whole reusability payoff.
-
Connects the pieces coherently: the per-key version powers both CAS and the persistent engine's newest-wins reads; tombstones serve both delete semantics and log-structured compaction; serializers provide both ergonomics and runtime type safety.
-
Reasons about
tradeoffs out loud
(Optional vs. throwing; striped locks vs. lock-free; LSM write-amplification vs. B-tree; TTL best-effort eviction vs. exact visibility) instead of asserting one true answer.
-
Produces clean, compilable-looking interface code and correctly navigates the Java generics/erasure subtleties.
Follow-up Questions
-
Implement a
hit counter
on top of this API. Why does it need no new primitives, and what does the retry-on-CAS loop look like? What bounds the retries?
-
How would
writeBatch
acquire locks across multiple keys without deadlocking, and what exactly aborts the batch?
-
For
schema evolution
, how does your serializer choice (JSON vs. Protobuf/Avro) affect whether old persisted data stays readable after a value-type change?
-
If you later had to scale beyond one node, what would partitioning, replication, and routing look like — and what guarantee does
writeBatch
lose across shards?