Implement a serializable key-value store
Company: OpenAI
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates competency in implementing robust serialization and in-memory data structures, covering binary format design, handling arbitrary bytes and Unicode, correctness across edge cases (empty state, large keys/values, overwrites, deletions), input validation and failure recovery, complexity analysis, and unit testing.
Constraints
- 1 <= len(operations) <= 200000
- The total number of bytes appearing in all keys, values, and serialized blobs is at most 10^6
- Each key and value length is between 0 and 2^32 - 1 bytes
- Average-case O(1) map operations are acceptable; each serialize/deserialize must be linear in the blob size
Examples
Input: [('put', b'name', b'alice'), ('put', b'lang', b'py'), ('get', b'name'), ('serialize',), ('erase', b'name'), ('get', b'name'), ('deserialize', b'KVS1\x00\x00\x00\x02\x00\x00\x00\x04name\x00\x00\x00\x05alice\x00\x00\x00\x04lang\x00\x00\x00\x02py'), ('get', b'name'), ('get', b'lang')]
Expected Output: [b'alice', b'KVS1\x00\x00\x00\x02\x00\x00\x00\x04name\x00\x00\x00\x05alice\x00\x00\x00\x04lang\x00\x00\x00\x02py', None, True, b'alice', b'py']
Explanation: The store is serialized with two entries, then one key is deleted. Deserializing the earlier blob restores both pairs.
Input: [('serialize',), ('deserialize', b'KVS1\x00\x00\x00\x00'), ('get', b'missing')]
Expected Output: [b'KVS1\x00\x00\x00\x00', True, None]
Explanation: An empty store serializes to just the header and a zero entry count. Deserializing it succeeds and the store remains empty.
Input: [('put', b'a', b'1'), ('put', b'a', b'2'), ('serialize',), ('erase', b'a'), ('get', b'a'), ('deserialize', b'KVS1\x00\x00\x00\x01\x00\x00\x00\x01a\x00\x00\x00\x012'), ('get', b'a')]
Expected Output: [b'KVS1\x00\x00\x00\x01\x00\x00\x00\x01a\x00\x00\x00\x012', None, True, b'2']
Explanation: Overwriting the same key updates the value instead of creating a second copy. The serialized blob contains only one entry for key b'a'.
Input: [('put', b'\xcf\x80', b'\x00\xffA'), ('get', b'\xcf\x80'), ('serialize',)]
Expected Output: [b'\x00\xffA', b'KVS1\x00\x00\x00\x01\x00\x00\x00\x02\xcf\x80\x00\x00\x00\x03\x00\xffA']
Explanation: The format supports arbitrary bytes in both keys and values, including non-ASCII bytes and embedded zero bytes.
Input: [('put', b'x', b'1'), ('deserialize', b'KVS1\x00\x00\x00\x01\x00\x00\x00\x01x'), ('get', b'x')]
Expected Output: [False, b'1']
Explanation: The blob is truncated after the key, so deserialization fails. The original store must remain unchanged.
Solution
def solution(operations):
MAGIC = b'KVS1'
MAX_U32 = (1 << 32) - 1
def to_bytes(x):
if isinstance(x, bytes):
return x
if isinstance(x, bytearray):
return bytes(x)
if isinstance(x, str):
return x.encode('utf-8')
raise TypeError('Expected bytes, bytearray, or str')
def write_u32(n):
return bytes([
(n >> 24) & 0xFF,
(n >> 16) & 0xFF,
(n >> 8) & 0xFF,
n & 0xFF,
])
def read_u32(data, pos):
if pos + 4 > len(data):
return None, pos
value = (
(data[pos] << 24)
| (data[pos + 1] << 16)
| (data[pos + 2] << 8)
| data[pos + 3]
)
return value, pos + 4
def serialize_store(store):
if len(store) > MAX_U32:
raise ValueError('Too many entries to serialize')
parts = [MAGIC, write_u32(len(store))]
for key, value in store.items():
if len(key) > MAX_U32 or len(value) > MAX_U32:
raise ValueError('Key or value too large to serialize')
parts.append(write_u32(len(key)))
parts.append(key)
parts.append(write_u32(len(value)))
parts.append(value)
return b''.join(parts)
def try_deserialize(blob):
try:
data = to_bytes(blob)
except TypeError:
return False, None
if len(data) < 8:
return False, None
if data[:4] != MAGIC:
return False, None
pos = 4
count, pos = read_u32(data, pos)
if count is None:
return False, None
new_store = {}
for _ in range(count):
key_len, pos = read_u32(data, pos)
if key_len is None or pos + key_len > len(data):
return False, None
key = data[pos:pos + key_len]
pos += key_len
value_len, pos = read_u32(data, pos)
if value_len is None or pos + value_len > len(data):
return False, None
value = data[pos:pos + value_len]
pos += value_len
if key in new_store:
return False, None
new_store[key] = value
if pos != len(data):
return False, None
return True, new_store
store = {}
result = []
for op in operations:
cmd = op[0]
if cmd == 'put':
key = to_bytes(op[1])
value = to_bytes(op[2])
if len(key) > MAX_U32 or len(value) > MAX_U32:
raise ValueError('Key or value too large')
store[key] = value
elif cmd == 'get':
key = to_bytes(op[1])
result.append(store.get(key))
elif cmd == 'erase':
key = to_bytes(op[1])
store.pop(key, None)
elif cmd == 'serialize':
result.append(serialize_store(store))
elif cmd == 'deserialize':
ok, parsed = try_deserialize(op[1])
result.append(ok)
if ok:
store = parsed
else:
raise ValueError('Unknown operation: {}'.format(cmd))
return resultTime complexity: O(Q + B) average, where Q is the number of map operations and B is the total number of bytes serialized or deserialized. Space complexity: O(M + S), where M is the current store contents and S is the size of the largest blob being parsed or produced.
Hints
- A delimiter like ':' is unsafe because keys and values may contain arbitrary bytes. Store a fixed-size length before each raw byte sequence.
- For failure recovery, do not modify the live store while parsing a blob. Decode into a temporary dictionary first, then replace the store only if parsing finishes exactly at the end.