Implement a persistent sharded key-value store
Company: OpenAI
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of persistent storage and file-based sharding, including on-disk layout, record serialization, indexing strategies for locating the latest value per key, shard rollover, and recovery from partial or corrupt tails.
Constraints
- 0 <= len(operations) <= 20000
- 1 <= shard_size <= 10000
- Keys and values are non-empty ASCII strings containing only letters and digits, so they never contain '|' or ';'
- The encoded length of every single put or delete record is at most shard_size
Examples
Input: (10, [('put', 'a', '1'), ('put', 'b', '22'), ('get', 'a'), ('put', 'a', '333'), ('get', 'a'), ('restore',), ('get', 'b')])
Expected Output: (['1', '333', '22'], ['P|a|1;', 'P|b|22;', 'P|a|333;'])
Explanation: Each write would overflow the current shard, so three shards are created. After restore, the latest value of 'a' is '333' and 'b' is still '22'.
Input: (20, [('put', 'ab', 'x'), ('put', 'c', 'yz'), ('delete', 'ab'), ('get', 'ab'), ('restore',), ('get', 'c'), ('get', 'ab')])
Expected Output: ([None, 'yz', None], ['P|ab|x;P|c|yz;D|ab;'])
Explanation: All records fit in one shard. The delete operation writes a tombstone for 'ab', so it is missing both before and after restore.
Input: (10, [('restore',), ('get', 'missing'), ('shutdown',)])
Expected Output: ([None], [])
Explanation: Restoring an empty store leaves it empty, and shutdown does not change anything in this simulation.
Input: (13, [('put', 'a', '1'), ('put', 'b', '22'), ('put', 'c', '3'), ('get', 'c'), ('restore',), ('get', 'a')])
Expected Output: (['3', '1'], ['P|a|1;P|b|22;', 'P|c|3;'])
Explanation: The first two records exactly fill the first shard: 6 + 7 = 13. The third record starts a new shard. Restore rebuilds the same state.
Input: (25, [('delete', 'ghost'), ('put', 'z', '0'), ('restore',), ('get', 'ghost'), ('get', 'z')])
Expected Output: ([None, '0'], ['P|z|0;'])
Explanation: Deleting a missing key is a no-op and does not create a record. Only the put for 'z' is stored.
Solution
def solution(shard_size, operations):
shards = []
index = {}
get_results = []
def append_record(record):
if not shards or len(shards[-1]) + len(record) > shard_size:
shards.append(record)
else:
shards[-1] += record
def rebuild_index():
restored = {}
for shard in shards:
for part in shard.split(';')[:-1]:
if not part:
continue
if part.startswith('P|'):
pieces = part.split('|', 2)
if len(pieces) == 3:
_, key, value = pieces
restored[key] = value
elif part.startswith('D|'):
pieces = part.split('|', 1)
if len(pieces) == 2:
_, key = pieces
restored.pop(key, None)
return restored
for op in operations:
kind = op[0]
if kind == 'put':
_, key, value = op
record = f'P|{key}|{value};'
append_record(record)
index[key] = value
elif kind == 'get':
_, key = op
get_results.append(index.get(key))
elif kind == 'delete':
_, key = op
if key in index:
record = f'D|{key};'
append_record(record)
index.pop(key, None)
elif kind == 'shutdown':
pass
elif kind == 'restore':
index = rebuild_index()
else:
raise ValueError('Unknown operation: ' + str(kind))
return (get_results, shards)
Time complexity: O(total_bytes_written + total_bytes_scanned_on_restore). Space complexity: O(total_bytes_in_shards + number_of_live_keys).
Hints
- Use an in-memory hash map for the current state so that each get is O(1) on average.
- Treat the shard contents as an append-only log. On restore, replay records from the first shard to the last; the latest record for a key wins.