Design a persistent key-value store
Company: OpenAI
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's ability to design persistent storage and in-memory data structures, covering serialization of arbitrary values, medium/interface design for byte-level persistence, crash-consistent flushing, error handling, and performance and complexity analysis.
Constraints
- 0 <= len(operations) <= 10^4
- All keys are strings with length between 1 and 100
- All values are JSON-serializable Python literals
- Total serialized bytes written across all shutdown/crash operations is at most 10^6
Examples
Input: [('set', 'a', 1), ('get', 'a'), ('shutdown',), ('set', 'a', 2), ('get', 'a'), ('restore',), ('get', 'a')]
Expected Output: [None, 1, True, None, 2, True, 1]
Explanation: The value 1 is persisted on shutdown. The later in-memory change to 2 is lost after restore because it was never flushed.
Input: [('set', 'user', {'name': 'Ada', 'tags': ['math', True], 'meta': {'age': 36}}), ('shutdown',), ('set', 'user', {'name': 'Grace'}), ('shutdown',), ('restore',), ('get', 'user')]
Expected Output: [None, True, None, True, True, {'name': 'Grace'}]
Explanation: Two valid snapshots are written. restore() loads the most recent valid one.
Input: [('set', 'x', 10), ('shutdown',), ('set', 'x', 99), ('crash',), ('set', 'y', 5), ('restore',), ('get', 'x'), ('get', 'y')]
Expected Output: [None, True, None, None, None, True, 10, None]
Explanation: The crash writes only half of the newer frame, so restore() ignores that invalid tail and falls back to the earlier valid snapshot where x = 10.
Input: [('restore',), ('get', 'missing')]
Expected Output: [False, None]
Explanation: With no valid snapshot in the medium, restore() returns False and the store remains empty.
Input: []
Expected Output: []
Explanation: No operations means no results.
Solution
def solution(operations):
import json
import hashlib
MAGIC = b"KVS1"
HEADER_LEN = 4 + 12 + 64
store = {}
medium = bytearray()
results = []
def build_frame(snapshot):
payload = json.dumps(
snapshot,
sort_keys=True,
separators=(",", ":"),
ensure_ascii=False,
).encode("utf-8")
length = f"{len(payload):012d}".encode("ascii")
checksum = hashlib.sha256(payload).hexdigest().encode("ascii")
return MAGIC + length + checksum + payload
def read_last_valid_snapshot():
data = bytes(medium)
i = 0
last_valid = None
while i + HEADER_LEN <= len(data):
if data[i:i + 4] != MAGIC:
break
length_bytes = data[i + 4:i + 16]
if not length_bytes.isdigit():
break
payload_len = int(length_bytes.decode("ascii"))
checksum = data[i + 16:i + 80]
end = i + HEADER_LEN + payload_len
if end > len(data):
break
payload = data[i + HEADER_LEN:end]
actual_checksum = hashlib.sha256(payload).hexdigest().encode("ascii")
if actual_checksum != checksum:
break
try:
snapshot = json.loads(payload.decode("utf-8"))
except Exception:
break
if not isinstance(snapshot, dict):
break
last_valid = snapshot
i = end
return last_valid
for op in operations:
cmd = op[0]
if cmd == "set":
_, key, value = op
store[key] = value
results.append(None)
elif cmd == "get":
_, key = op
results.append(store.get(key))
elif cmd == "shutdown":
medium.extend(build_frame(store))
results.append(True)
elif cmd == "restore":
snapshot = read_last_valid_snapshot()
if snapshot is None:
store = {}
results.append(False)
else:
store = snapshot
results.append(True)
elif cmd == "crash":
frame = build_frame(store)
medium.extend(frame[: len(frame) // 2])
results.append(None)
else:
raise ValueError(f"Unknown operation: {cmd}")
return results
Time complexity: set/get: O(1) average, shutdown: O(S), restore: O(B), where S is the serialized size of the current store and B is the total number of bytes in the medium. Space complexity: O(K + B), where K is the in-memory store size and B is the medium size.
Hints
- Use a self-describing frame: fixed magic bytes, a fixed-width length field, and a checksum before the payload.
- During restore, keep track of the last valid snapshot you parsed so that a truncated final write does not destroy previously persisted state.