How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a hard difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at OpenAI.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at OpenAI during technical interviews.

Implement Persistent KV Store Serialization

Quick Overview

This question evaluates understanding of persistent storage design, binary-safe serialization, exact state restoration, and mutation logging for a key-value store, testing file I/O, data structures, and storage formats within the Coding & Algorithms domain.

Implement Persistent KV Store Serialization

Company: OpenAI

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

Implement a persistent in-memory key-value store. Requirements: - Keys and values are arbitrary byte strings or UTF-8 strings; do not rely on delimiters that may appear in the data. - Support put(key, value), get(key), delete(key), serialize(path), and deserialize(path). - Deserialization must restore exactly the state that was serialized. Follow-ups: 1. The data may not fit into a single file. Extend the persistence format to split data across multiple files or segments and restore from them. 2. A full rewrite on shutdown is too expensive after the store has been restored and modified. Add an append-only log for mutations, implement log replay during startup, and define when to compact the log into a snapshot.

Quick Answer: This question evaluates understanding of persistent storage design, binary-safe serialization, exact state restoration, and mutation logging for a key-value store, testing file I/O, data structures, and storage formats within the Coding & Algorithms domain.

Part 1: Serialize and Deserialize a Persistent In-Memory KV Store

Implement a persistent in-memory key-value store using a fake disk inside the function. The store supports `put`, `get`, `delete`, `serialize(path)`, and `deserialize(path)`. Each key and value is either a Python `bytes` object or a `str`. Your persistence format must be binary-safe: do not rely on delimiters that may appear inside the data. When `deserialize(path)` is called, the current in-memory store must become exactly the state that was previously written by `serialize(path)`. Because an online judge should not depend on real file I/O, treat `path` as a label in an internal dictionary that represents disk.

Constraints

`0 <= len(operations) <= 10^4`
Keys and values are only of type `bytes` or `str`
Total size of all live keys and values at any moment is at most `10^6` bytes
Every `deserialize(path)` refers to a path that was previously serialized

Examples

Input: ([('put', 'a', '1'), ('get', 'a'), ('serialize', 'snap1'), ('put', 'a', '2'), ('deserialize', 'snap1'), ('get', 'a')],)

Expected Output: ['1', '1']

Explanation: After deserializing `snap1`, the old value `'1'` is restored.

Input: ([('put', b'a|b', b'v:1'), ('serialize', 'p'), ('delete', b'a|b'), ('get', b'a|b'), ('deserialize', 'p'), ('get', b'a|b')],)

Expected Output: [None, b'v:1']

Explanation: The format must not break when keys or values contain delimiter-like bytes.

Input: ([('serialize', 'empty'), ('put', b'x', b'y'), ('deserialize', 'empty'), ('get', b'x')],)

Expected Output: [None]

Explanation: Edge case: serializing and restoring an empty store should remove later changes.

Input: ([('put', 'ключ', b'value'), ('put', b'bin', 'тест'), ('serialize', 'mix'), ('delete', 'ключ'), ('deserialize', 'mix'), ('get', 'ключ'), ('get', b'bin')],)

Expected Output: [b'value', 'тест']

Explanation: Mixed `str` and `bytes` entries must be restored with the same types.

Hints

Store lengths explicitly. A length-prefixed binary format is safe even when the data contains `:`, `|`, or other separator characters.
If you support both `bytes` and `str`, add a small type tag before each encoded field so deserialization can restore the exact type.

Part 2: Persist the KV Store Across Multiple Segments

Extend the persistent key-value store so a snapshot can be split across multiple segment files. Implement the same store operations as in Part 1, but now `serialize(prefix)` must save the snapshot as a list of segment blobs, where each individual segment has size at most `max_segment_size` bytes. Segments may later be returned in a different order, so every segment must contain enough metadata to reconstruct the original snapshot correctly. Use a binary-safe encoding for keys and values; do not rely on delimiters. For testing, the fake disk stores a list of segment blobs under each `prefix`.

Constraints

`9 <= max_segment_size <= 10^6`
`0 <= len(operations) <= 10^4`
Total serialized snapshot size at any moment is at most `10^6` bytes
Every `deserialize(prefix)` and `segment_count(prefix)` refers to a prefix that was previously serialized
Every `reorder(prefix, order)` uses a valid permutation of the existing segment indices

Examples

Input: (15, [('put', b'a', b'12345'), ('put', b'b', b'67890'), ('serialize', 'p'), ('segment_count', 'p'), ('delete', b'a'), ('deserialize', 'p'), ('get', b'a'), ('get', b'b')])

Expected Output: [6, b'12345', b'67890']

Explanation: The snapshot spans multiple segments when the per-segment size limit is small.

Input: (20, [('put', b'aa', b'xx'), ('put', b'bb', b'yy'), ('serialize', 'snap'), ('segment_count', 'snap'), ('reorder', 'snap', [2, 0, 1]), ('delete', b'aa'), ('deserialize', 'snap'), ('get', b'aa'), ('get', b'bb')])

Expected Output: [3, b'xx', b'yy']

Explanation: Even after reordering the saved segments, deserialization must rebuild the correct store.

Input: (12, [('serialize', 'empty'), ('segment_count', 'empty'), ('put', b'a', b'b'), ('deserialize', 'empty'), ('get', b'a')])

Expected Output: [1, None]

Explanation: Edge case: an empty snapshot is still a valid serialized snapshot and fits in one segment.

Input: (13, [('put', b'k', b'abcdefghij'), ('serialize', 'big'), ('segment_count', 'big'), ('deserialize', 'big'), ('get', b'k')])

Expected Output: [5, b'abcdefghij']

Explanation: A single large value may need many segments; the restored value must be exact.

Hints

A clean approach is to serialize the whole snapshot into one binary stream first, then split that stream into fixed-size payload chunks.
Add a fixed-size header to each segment containing at least the segment index and the total number of segments.

Part 3: Add an Append-Only Mutation Log with Replay and Compaction

Implement a persistent key-value store that keeps two persisted artifacts on a fake disk: 1. a full snapshot of the store 2. an append-only log of later mutations The store starts empty. Every `put` and `delete` must update the in-memory map and also append a mutation record to the persisted log. A `restart` simulates process shutdown and startup: throw away the in-memory state, reload the snapshot, and replay the log in order. A full rewrite after every mutation is too expensive, so snapshot compaction should happen only when the pending log becomes large enough. In this problem, define the compaction policy as follows: if the number of pending log records becomes greater than or equal to `compact_threshold` immediately after a mutation, write a fresh snapshot of the current store and clear the log. Keys and values are either `bytes` or `str`, and the serialization format must be binary-safe.

Constraints

`1 <= compact_threshold <= 10^4`
`0 <= len(operations) <= 10^4`
Keys and values are only of type `bytes` or `str`
Total size of snapshot plus log at any moment is at most `10^6` bytes

Examples

Input: (3, [('put', 'a', '1'), ('put', 'b', '2'), ('restart',), ('get', 'a'), ('get', 'b'), ('status',)])

Expected Output: ['1', '2', (2, 2, 0)]

Explanation: With threshold 3, two mutations stay only in the log; restart must replay them correctly.

Input: (2, [('put', 'x', '1'), ('status',), ('put', 'y', '2'), ('status',), ('restart',), ('get', 'x'), ('get', 'y'), ('status',)])

Expected Output: [(1, 1, 0), (2, 0, 1), '1', '2', (2, 0, 1)]

Explanation: The second mutation reaches the threshold, so the log is compacted into a snapshot and cleared.

Input: (10, [('put', 'a', '1'), ('put', 'b', '2'), ('delete', 'a'), ('restart',), ('get', 'a'), ('get', 'b'), ('status',)])

Expected Output: [None, '2', (1, 3, 0)]

Explanation: Replay must apply the delete after the earlier put, leaving only `'b'`.

Input: (1, [('restart',), ('get', 'missing'), ('put', b'k', b'v'), ('status',), ('delete', b'k'), ('restart',), ('get', b'k'), ('status',)])

Expected Output: [None, (1, 0, 1), None, (0, 0, 2)]

Explanation: Edge case: threshold 1 means every mutation compacts immediately.

Hints

Keep snapshot encoding and log encoding separate. The snapshot stores full state, while the log stores only mutations.
On `restart`, rebuild by loading the snapshot first and then replaying each log record in order. After each new mutation, check whether it is time to compact.

Quick Overview

Implement Persistent KV Store Serialization

Company: OpenAI

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

Part 1: Serialize and Deserialize a Persistent In-Memory KV Store

Constraints

`0 <= len(operations) <= 10^4`
Keys and values are only of type `bytes` or `str`
Total size of all live keys and values at any moment is at most `10^6` bytes
Every `deserialize(path)` refers to a path that was previously serialized

Examples

Input: ([('put', 'a', '1'), ('get', 'a'), ('serialize', 'snap1'), ('put', 'a', '2'), ('deserialize', 'snap1'), ('get', 'a')],)

Expected Output: ['1', '1']

Explanation: After deserializing `snap1`, the old value `'1'` is restored.

Input: ([('put', b'a|b', b'v:1'), ('serialize', 'p'), ('delete', b'a|b'), ('get', b'a|b'), ('deserialize', 'p'), ('get', b'a|b')],)

Expected Output: [None, b'v:1']

Explanation: The format must not break when keys or values contain delimiter-like bytes.

Input: ([('serialize', 'empty'), ('put', b'x', b'y'), ('deserialize', 'empty'), ('get', b'x')],)

Expected Output: [None]

Explanation: Edge case: serializing and restoring an empty store should remove later changes.

Input: ([('put', 'ключ', b'value'), ('put', b'bin', 'тест'), ('serialize', 'mix'), ('delete', 'ключ'), ('deserialize', 'mix'), ('get', 'ключ'), ('get', b'bin')],)

Expected Output: [b'value', 'тест']

Explanation: Mixed `str` and `bytes` entries must be restored with the same types.

Hints

Store lengths explicitly. A length-prefixed binary format is safe even when the data contains `:`, `|`, or other separator characters.
If you support both `bytes` and `str`, add a small type tag before each encoded field so deserialization can restore the exact type.

Part 2: Persist the KV Store Across Multiple Segments

Constraints

`9 <= max_segment_size <= 10^6`
`0 <= len(operations) <= 10^4`
Total serialized snapshot size at any moment is at most `10^6` bytes
Every `deserialize(prefix)` and `segment_count(prefix)` refers to a prefix that was previously serialized
Every `reorder(prefix, order)` uses a valid permutation of the existing segment indices

Examples

Input: (15, [('put', b'a', b'12345'), ('put', b'b', b'67890'), ('serialize', 'p'), ('segment_count', 'p'), ('delete', b'a'), ('deserialize', 'p'), ('get', b'a'), ('get', b'b')])

Expected Output: [6, b'12345', b'67890']

Explanation: The snapshot spans multiple segments when the per-segment size limit is small.

Expected Output: [3, b'xx', b'yy']

Explanation: Even after reordering the saved segments, deserialization must rebuild the correct store.

Input: (12, [('serialize', 'empty'), ('segment_count', 'empty'), ('put', b'a', b'b'), ('deserialize', 'empty'), ('get', b'a')])

Expected Output: [1, None]

Explanation: Edge case: an empty snapshot is still a valid serialized snapshot and fits in one segment.

Input: (13, [('put', b'k', b'abcdefghij'), ('serialize', 'big'), ('segment_count', 'big'), ('deserialize', 'big'), ('get', b'k')])

Expected Output: [5, b'abcdefghij']

Explanation: A single large value may need many segments; the restored value must be exact.

Hints

A clean approach is to serialize the whole snapshot into one binary stream first, then split that stream into fixed-size payload chunks.
Add a fixed-size header to each segment containing at least the segment index and the total number of segments.

Part 3: Add an Append-Only Mutation Log with Replay and Compaction

Constraints

`1 <= compact_threshold <= 10^4`
`0 <= len(operations) <= 10^4`
Keys and values are only of type `bytes` or `str`
Total size of snapshot plus log at any moment is at most `10^6` bytes

Examples

Input: (3, [('put', 'a', '1'), ('put', 'b', '2'), ('restart',), ('get', 'a'), ('get', 'b'), ('status',)])

Expected Output: ['1', '2', (2, 2, 0)]

Explanation: With threshold 3, two mutations stay only in the log; restart must replay them correctly.

Input: (2, [('put', 'x', '1'), ('status',), ('put', 'y', '2'), ('status',), ('restart',), ('get', 'x'), ('get', 'y'), ('status',)])

Expected Output: [(1, 1, 0), (2, 0, 1), '1', '2', (2, 0, 1)]

Explanation: The second mutation reaches the threshold, so the log is compacted into a snapshot and cleared.

Input: (10, [('put', 'a', '1'), ('put', 'b', '2'), ('delete', 'a'), ('restart',), ('get', 'a'), ('get', 'b'), ('status',)])

Expected Output: [None, '2', (1, 3, 0)]

Explanation: Replay must apply the delete after the earlier put, leaving only `'b'`.

Input: (1, [('restart',), ('get', 'missing'), ('put', b'k', b'v'), ('status',), ('delete', b'k'), ('restart',), ('get', b'k'), ('status',)])

Expected Output: [None, (1, 0, 1), None, (0, 0, 2)]

Explanation: Edge case: threshold 1 means every mutation compacts immediately.

Hints

Keep snapshot encoding and log encoding separate. The snapshot stores full state, while the log stores only mutations.
On `restart`, rebuild by loading the snapshot first and then replaying each log record in order. After each new mutation, check whether it is time to compact.