PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates competency in implementing robust serialization and in-memory data structures, covering binary format design, handling arbitrary bytes and Unicode, correctness across edge cases (empty state, large keys/values, overwrites, deletions), input validation and failure recovery, complexity analysis, and unit testing.

  • Medium
  • OpenAI
  • Coding & Algorithms
  • Software Engineer

Implement a serializable key-value store

Company: OpenAI

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Implement in C++ an in-memory key-value store with: - put(key: string, value: string), get(key) -> optional<string>, erase(key). - serialize() -> vector<uint8_t> that encodes the entire store state. - deserialize(const vector<uint8_t>&) to restore an identical store state. Constraints: - Use only the provided helper byte I/O functions; do not use external libraries for serialization. - Choose a robust binary format (versioned, length-prefixed) that supports arbitrary bytes in keys/values and Unicode. - Ensure correctness for empty store, very large keys/values, duplicate overwrites, and deletions. - Validate input and handle malformed or truncated streams gracefully with clear error signaling. - Time/space complexity should be O(n) in the total serialized data size; support streaming-friendly encoding. - Provide unit tests and explain complexity, trade-offs (e.g., delimiter-escaped text vs. length-prefixed binary), and failure recovery.

Quick Answer: This question evaluates competency in implementing robust serialization and in-memory data structures, covering binary format design, handling arbitrary bytes and Unicode, correctness across edge cases (empty state, large keys/values, overwrites, deletions), input validation and failure recovery, complexity analysis, and unit testing.

Implement a single function that simulates an in-memory key-value store with operations equivalent to put(key, value), get(key), erase(key), serialize(), and deserialize(blob). Because the judge calls one function instead of a class, you are given a list of operations to execute on one store. Use this exact binary format for serialization: 1. 4-byte magic header: ASCII bytes KVS1 2. 4-byte unsigned big-endian entry count 3. For each key-value pair in current insertion order: - 4-byte unsigned big-endian key length - raw key bytes - 4-byte unsigned big-endian value length - raw value bytes Why binary and length-prefixed? Delimiter-based text formats are fragile when keys or values may contain arbitrary bytes, including zero bytes and non-ASCII Unicode data. Length prefixes make the format robust and streaming-friendly. Rules: - Keys and values are bytes objects. If a test passes a Python str, treat it as UTF-8 and convert it to bytes. - put overwrites an existing key's value without creating a duplicate entry. - erase on a missing key does nothing. - get returns the stored value as bytes, or None if missing. - serialize returns the encoded bytes of the entire store. - deserialize replaces the current store only if the blob is fully valid. On failure, return False and leave the store unchanged. - A blob is malformed if it has a wrong header, is truncated, contains duplicate keys, or has extra trailing bytes. Return a list containing the outputs of every get, serialize, and deserialize operation, in order.

Constraints

  • 1 <= len(operations) <= 200000
  • The total number of bytes appearing in all keys, values, and serialized blobs is at most 10^6
  • Each key and value length is between 0 and 2^32 - 1 bytes
  • Average-case O(1) map operations are acceptable; each serialize/deserialize must be linear in the blob size

Examples

Input: [('put', b'name', b'alice'), ('put', b'lang', b'py'), ('get', b'name'), ('serialize',), ('erase', b'name'), ('get', b'name'), ('deserialize', b'KVS1\x00\x00\x00\x02\x00\x00\x00\x04name\x00\x00\x00\x05alice\x00\x00\x00\x04lang\x00\x00\x00\x02py'), ('get', b'name'), ('get', b'lang')]

Expected Output: [b'alice', b'KVS1\x00\x00\x00\x02\x00\x00\x00\x04name\x00\x00\x00\x05alice\x00\x00\x00\x04lang\x00\x00\x00\x02py', None, True, b'alice', b'py']

Explanation: The store is serialized with two entries, then one key is deleted. Deserializing the earlier blob restores both pairs.

Input: [('serialize',), ('deserialize', b'KVS1\x00\x00\x00\x00'), ('get', b'missing')]

Expected Output: [b'KVS1\x00\x00\x00\x00', True, None]

Explanation: An empty store serializes to just the header and a zero entry count. Deserializing it succeeds and the store remains empty.

Input: [('put', b'a', b'1'), ('put', b'a', b'2'), ('serialize',), ('erase', b'a'), ('get', b'a'), ('deserialize', b'KVS1\x00\x00\x00\x01\x00\x00\x00\x01a\x00\x00\x00\x012'), ('get', b'a')]

Expected Output: [b'KVS1\x00\x00\x00\x01\x00\x00\x00\x01a\x00\x00\x00\x012', None, True, b'2']

Explanation: Overwriting the same key updates the value instead of creating a second copy. The serialized blob contains only one entry for key b'a'.

Input: [('put', b'\xcf\x80', b'\x00\xffA'), ('get', b'\xcf\x80'), ('serialize',)]

Expected Output: [b'\x00\xffA', b'KVS1\x00\x00\x00\x01\x00\x00\x00\x02\xcf\x80\x00\x00\x00\x03\x00\xffA']

Explanation: The format supports arbitrary bytes in both keys and values, including non-ASCII bytes and embedded zero bytes.

Input: [('put', b'x', b'1'), ('deserialize', b'KVS1\x00\x00\x00\x01\x00\x00\x00\x01x'), ('get', b'x')]

Expected Output: [False, b'1']

Explanation: The blob is truncated after the key, so deserialization fails. The original store must remain unchanged.

Solution

def solution(operations):
    MAGIC = b'KVS1'
    MAX_U32 = (1 << 32) - 1

    def to_bytes(x):
        if isinstance(x, bytes):
            return x
        if isinstance(x, bytearray):
            return bytes(x)
        if isinstance(x, str):
            return x.encode('utf-8')
        raise TypeError('Expected bytes, bytearray, or str')

    def write_u32(n):
        return bytes([
            (n >> 24) & 0xFF,
            (n >> 16) & 0xFF,
            (n >> 8) & 0xFF,
            n & 0xFF,
        ])

    def read_u32(data, pos):
        if pos + 4 > len(data):
            return None, pos
        value = (
            (data[pos] << 24)
            | (data[pos + 1] << 16)
            | (data[pos + 2] << 8)
            | data[pos + 3]
        )
        return value, pos + 4

    def serialize_store(store):
        if len(store) > MAX_U32:
            raise ValueError('Too many entries to serialize')
        parts = [MAGIC, write_u32(len(store))]
        for key, value in store.items():
            if len(key) > MAX_U32 or len(value) > MAX_U32:
                raise ValueError('Key or value too large to serialize')
            parts.append(write_u32(len(key)))
            parts.append(key)
            parts.append(write_u32(len(value)))
            parts.append(value)
        return b''.join(parts)

    def try_deserialize(blob):
        try:
            data = to_bytes(blob)
        except TypeError:
            return False, None

        if len(data) < 8:
            return False, None
        if data[:4] != MAGIC:
            return False, None

        pos = 4
        count, pos = read_u32(data, pos)
        if count is None:
            return False, None

        new_store = {}
        for _ in range(count):
            key_len, pos = read_u32(data, pos)
            if key_len is None or pos + key_len > len(data):
                return False, None
            key = data[pos:pos + key_len]
            pos += key_len

            value_len, pos = read_u32(data, pos)
            if value_len is None or pos + value_len > len(data):
                return False, None
            value = data[pos:pos + value_len]
            pos += value_len

            if key in new_store:
                return False, None
            new_store[key] = value

        if pos != len(data):
            return False, None

        return True, new_store

    store = {}
    result = []

    for op in operations:
        cmd = op[0]

        if cmd == 'put':
            key = to_bytes(op[1])
            value = to_bytes(op[2])
            if len(key) > MAX_U32 or len(value) > MAX_U32:
                raise ValueError('Key or value too large')
            store[key] = value

        elif cmd == 'get':
            key = to_bytes(op[1])
            result.append(store.get(key))

        elif cmd == 'erase':
            key = to_bytes(op[1])
            store.pop(key, None)

        elif cmd == 'serialize':
            result.append(serialize_store(store))

        elif cmd == 'deserialize':
            ok, parsed = try_deserialize(op[1])
            result.append(ok)
            if ok:
                store = parsed

        else:
            raise ValueError('Unknown operation: {}'.format(cmd))

    return result

Time complexity: O(Q + B) average, where Q is the number of map operations and B is the total number of bytes serialized or deserialized. Space complexity: O(M + S), where M is the current store contents and S is the size of the largest blob being parsed or produced.

Hints

  1. A delimiter like ':' is unsafe because keys and values may contain arbitrary bytes. Store a fixed-size length before each raw byte sequence.
  2. For failure recovery, do not modify the live store while parsing a blob. Decode into a temporary dictionary first, then replace the store only if parsing finishes exactly at the end.
Last updated: May 23, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement IP Address Arithmetic - OpenAI (hard)
  • Simulate Infection Spread on a Grid - OpenAI (hard)
  • Implement Social Follow Recommendations - OpenAI (medium)
  • Build a Compose Rating Card - OpenAI (medium)
  • Generate Data Labeling Schedules - OpenAI (medium)