PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates skills in binary serialization and deserialization, data encoding and layout, type tagging, endianness, checksumming, versioning for forward/backward compatibility, and algorithmic efficiency and space usage for an in-memory key-value store; it falls under the Coding & Algorithms domain.

  • Medium
  • OpenAI
  • Coding & Algorithms
  • Software Engineer

Implement KV store serialization

Company: OpenAI

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Implement serialization and deserialization for an in‑memory key‑value store to and from a contiguous bytes buffer. Keys are UTF‑8 strings; values may be int64, float64, bool, UTF‑8 string, or nested maps. Specify and implement a binary format including a magic header and version, per‑entry type tags, length‑prefixing, endianness, checksums, and forward/backward compatibility. Provide encode(store)->bytes and decode(bytes)->store with O(n) time and minimal extra space. Discuss trade‑offs compared to text formats like JSON or MessagePack.

Quick Answer: This question evaluates skills in binary serialization and deserialization, data encoding and layout, type tagging, endianness, checksumming, versioning for forward/backward compatibility, and algorithmic efficiency and space usage for an in-memory key-value store; it falls under the Coding & Algorithms domain.

Implement a **serializer and deserializer** for an in-memory key-value store using a custom binary format. ## What to implement Write a single function `solution(operation, data)` that dispatches on the string `operation`: - `solution('encode', store)` → returns the serialized **`bytes`**. - `solution('decode', buffer)` → returns the reconstructed **store** (a `dict`). Any other `operation` value must raise an error. ## The store `store` is a nested Python `dict` whose **keys are UTF-8 strings** and whose values are one of: - **int64** — a Python `int` in signed 64-bit range - **float64** — a Python `float` - **bool** - **UTF-8 string** - another **nested dict** following the same rules Because `bool` is a subclass of `int` in Python, treat `True`/`False` as **bool**, not as integers. ## Binary layout Every encoded buffer has three parts: 1. **Header** (9 bytes) - 4 bytes magic: `b'KVSB'` - 1 byte version: `1` - 4 bytes: little-endian **unsigned** body length 2. **Body** - the root **map payload** (see below) 3. **Footer** (4 bytes) - 4 bytes: little-endian **unsigned** checksum **Checksum rule:** the checksum is the unsigned 32-bit sum of **every byte before the checksum field** (i.e. header + body), taken modulo `2^32`. ### Map payload format - 4 bytes: little-endian **unsigned** entry count - then each entry, emitted in the **dict's existing iteration order** (do **not** sort keys) ### Entry format - 4 bytes: little-endian unsigned **key length** - the key bytes (UTF-8) - 1 byte: **type tag** - 4 bytes: little-endian unsigned **value length** - the value bytes ### Type tags | Tag | Type | Value payload | |-----|------|---------------| | `1` | int64 | exactly 8 bytes, little-endian **signed** | | `2` | float64 | exactly 8 bytes, little-endian IEEE-754 | | `3` | bool | exactly 1 byte: `0` or `1` | | `4` | UTF-8 string | raw UTF-8 bytes (any length) | | `5` | nested map | another full map payload | > All multi-byte integers in this format (body length, entry count, key length, value length, checksum) are **little-endian unsigned**, except the int64 value payload (tag `1`), which is little-endian **signed**. ## Decoder validation The decoder must reject malformed input by raising an error. It must validate: - the magic bytes and the version (**reject unsupported versions**); - the total buffer length is exactly `9 + body_length + 4`; - the checksum matches the recomputed sum; - every length prefix stays within bounds (no truncated keys, tags, or values); - keys and string values are valid UTF-8; - per-type payload sizes (int64/float64 = 8 bytes, bool = 1 byte and value is `0` or `1`); - the type tag is known (**reject unknown tags**); - each nested map payload is consumed exactly. ## Constraints - All keys are strings, all strings are UTF-8, and all integers fit in signed 64-bit range. - Serialize entries in the dict's existing iteration order; do not sort keys. - The total serialized size is at most `10^6` bytes, and nesting depth is at most `100`. - Within a single map, keys are unique. ## Round-trip guarantee Encoding and decoding are inverses: `solution('decode', solution('encode', store))` reproduces the original store, with each field read back in the same order it was written.

Constraints

  • All keys are strings, all strings are UTF-8, and all integers fit in signed 64-bit range.
  • Serialize entries in the dict's existing iteration order; do not sort keys.
  • The total serialized size is at most 10^6 bytes, and nesting depth is at most 100.
  • Within a map, keys are unique.

Examples

Input: ('encode', {})

Expected Output: b'KVSB\x01\x04\x00\x00\x00\x00\x00\x00\x00;\x01\x00\x00'

Explanation: An empty map body is just a 4-byte entry count of 0. The checksum is the sum of the header and body bytes modulo 2^32.

Input: ('decode', b'KVSB\x01\x04\x00\x00\x00\x00\x00\x00\x00;\x01\x00\x00')

Expected Output: {}

Explanation: This is the valid encoding of an empty root map, so decoding returns an empty dict.

Input: ('encode', {'pi': 1.5})

Expected Output: b'KVSB\x01\x17\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00pi\x02\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf8?\x6b\x03\x00\x00'

Explanation: The map has one entry with key 'pi', type tag 2 for float64, and the IEEE-754 little-endian bytes for 1.5.

Input: ('decode', b'KVSB\x01\x24\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00m\x05\x16\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00a\x01\x08\x00\x00\x00\xfe\xff\xff\xff\xff\xff\xff\xff\x48\x0a\x00\x00')

Expected Output: {'m': {'a': -2}}

Explanation: The outer map contains one nested map under key 'm'. The nested map contains key 'a' with the int64 value -2.

Input: ('decode', b'KVSB\x01\x1b\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00t\x03\x01\x00\x00\x00\x01\x01\x00\x00\x00u\x04\x02\x00\x00\x00\xc3\xa9\xb6\x03\x00\x00')

Expected Output: {'t': True, 'u': 'é'}

Explanation: This buffer contains a bool entry and a UTF-8 string entry. The decoder must interpret tag 3 as bool and tag 4 as UTF-8 text.

Hints

  1. Give every value a type tag and a length prefix. That makes recursive decoding simpler and leaves room for future type tags.
  2. While decoding, keep one index into the buffer and advance it carefully. For nested maps, decode only within the subrange described by that value's length.
Last updated: Apr 23, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Infection Spread Simulation with Death Threshold - OpenAI (medium)
  • Spreading Contagion on a Grid - OpenAI (medium)
  • Streaming Entropy with Numerical Stability - OpenAI (hard)
  • Implement a Distributed Rate Limiter - OpenAI (medium)
  • Compute Plant Infection Time - OpenAI (medium)