PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCareers

Quick Overview

This question evaluates a candidate's ability to design persistent storage and in-memory data structures, covering serialization of arbitrary values, medium/interface design for byte-level persistence, crash-consistent flushing, error handling, and performance and complexity analysis.

  • Medium
  • OpenAI
  • Coding & Algorithms
  • Software Engineer

Design a persistent key-value store

Company: OpenAI

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Design and implement an in-memory key-value store with a 'medium' layer that serializes the store to bytes for persistence. Provide four functions: set(key, value), get(key), shutdown(), and restore(). On shutdown(), flush all data to the medium; on restore(), rebuild state from the medium. Specify: (a) the medium interface for writing/reading bytes, (b) the serialization format and handling of arbitrary values, (c) error handling and data integrity (atomic flush, crash consistency), (d) complexity and performance considerations, and (e) minimal tests and example usage.

Quick Answer: This question evaluates a candidate's ability to design persistent storage and in-memory data structures, covering serialization of arbitrary values, medium/interface design for byte-level persistence, crash-consistent flushing, error handling, and performance and complexity analysis.

You are given a sequence of operations for an in-memory key-value store. The store must support persistence through a byte-based medium. Public store operations: - set(key, value): store or overwrite a value - get(key): return the current in-memory value, or None if missing - shutdown(): flush the entire current store to the medium - restore(): rebuild in-memory state from the medium For this coding problem, implement a single function that processes a list of operations and returns the result of each operation. Medium interface (conceptual): - append(bytes): add bytes to the end of the medium - read_all() -> bytes: return all bytes currently stored In this problem, the medium is internal to your solution and should behave like an append-only byte buffer. Serialization format: - Keys are strings. - Values are JSON-serializable Python literals: None, bool, int, float, str, list, and dict with string keys. - Each shutdown writes one complete snapshot frame to the medium. - A frame is: 1. MAGIC: 4 bytes, always b'KVS1' 2. LEN: 12 ASCII digits, zero-padded, payload byte length 3. CHECKSUM: 64 ASCII lowercase hex characters, SHA-256 of the payload 4. PAYLOAD: UTF-8 bytes of json.dumps(store, sort_keys=True, separators=(',', ':'), ensure_ascii=False) Data integrity and crash consistency: - shutdown() must behave atomically at the frame level: a fully written frame is valid, and a partial trailing frame must not corrupt previously persisted data. - restore() must scan the medium from left to right, validate frames, and load the last valid snapshot. - If the tail of the medium is truncated or has a bad checksum, ignore that invalid tail and restore the most recent earlier valid snapshot. - If no valid snapshot exists, restore() clears memory and returns False. Judge-only crash simulation: - The tests may include a special operation ('crash',). It simulates an interrupted shutdown by appending only the first half of the frame for the current in-memory state. It is not part of the public API, but your processor must support it. Return value for each operation: - set -> None - get -> stored value or None - shutdown -> True - restore -> True if a valid snapshot was loaded, otherwise False - crash -> None

Constraints

  • 0 <= len(operations) <= 10^4
  • All keys are strings with length between 1 and 100
  • All values are JSON-serializable Python literals
  • Total serialized bytes written across all shutdown/crash operations is at most 10^6

Examples

Input: [('set', 'a', 1), ('get', 'a'), ('shutdown',), ('set', 'a', 2), ('get', 'a'), ('restore',), ('get', 'a')]

Expected Output: [None, 1, True, None, 2, True, 1]

Explanation: The value 1 is persisted on shutdown. The later in-memory change to 2 is lost after restore because it was never flushed.

Input: [('set', 'user', {'name': 'Ada', 'tags': ['math', True], 'meta': {'age': 36}}), ('shutdown',), ('set', 'user', {'name': 'Grace'}), ('shutdown',), ('restore',), ('get', 'user')]

Expected Output: [None, True, None, True, True, {'name': 'Grace'}]

Explanation: Two valid snapshots are written. restore() loads the most recent valid one.

Input: [('set', 'x', 10), ('shutdown',), ('set', 'x', 99), ('crash',), ('set', 'y', 5), ('restore',), ('get', 'x'), ('get', 'y')]

Expected Output: [None, True, None, None, None, True, 10, None]

Explanation: The crash writes only half of the newer frame, so restore() ignores that invalid tail and falls back to the earlier valid snapshot where x = 10.

Input: [('restore',), ('get', 'missing')]

Expected Output: [False, None]

Explanation: With no valid snapshot in the medium, restore() returns False and the store remains empty.

Input: []

Expected Output: []

Explanation: No operations means no results.

Solution

def solution(operations):
    import json
    import hashlib

    MAGIC = b"KVS1"
    HEADER_LEN = 4 + 12 + 64

    store = {}
    medium = bytearray()
    results = []

    def build_frame(snapshot):
        payload = json.dumps(
            snapshot,
            sort_keys=True,
            separators=(",", ":"),
            ensure_ascii=False,
        ).encode("utf-8")
        length = f"{len(payload):012d}".encode("ascii")
        checksum = hashlib.sha256(payload).hexdigest().encode("ascii")
        return MAGIC + length + checksum + payload

    def read_last_valid_snapshot():
        data = bytes(medium)
        i = 0
        last_valid = None

        while i + HEADER_LEN <= len(data):
            if data[i:i + 4] != MAGIC:
                break

            length_bytes = data[i + 4:i + 16]
            if not length_bytes.isdigit():
                break

            payload_len = int(length_bytes.decode("ascii"))
            checksum = data[i + 16:i + 80]
            end = i + HEADER_LEN + payload_len

            if end > len(data):
                break

            payload = data[i + HEADER_LEN:end]
            actual_checksum = hashlib.sha256(payload).hexdigest().encode("ascii")
            if actual_checksum != checksum:
                break

            try:
                snapshot = json.loads(payload.decode("utf-8"))
            except Exception:
                break

            if not isinstance(snapshot, dict):
                break

            last_valid = snapshot
            i = end

        return last_valid

    for op in operations:
        cmd = op[0]

        if cmd == "set":
            _, key, value = op
            store[key] = value
            results.append(None)

        elif cmd == "get":
            _, key = op
            results.append(store.get(key))

        elif cmd == "shutdown":
            medium.extend(build_frame(store))
            results.append(True)

        elif cmd == "restore":
            snapshot = read_last_valid_snapshot()
            if snapshot is None:
                store = {}
                results.append(False)
            else:
                store = snapshot
                results.append(True)

        elif cmd == "crash":
            frame = build_frame(store)
            medium.extend(frame[: len(frame) // 2])
            results.append(None)

        else:
            raise ValueError(f"Unknown operation: {cmd}")

    return results

Time complexity: set/get: O(1) average, shutdown: O(S), restore: O(B), where S is the serialized size of the current store and B is the total number of bytes in the medium. Space complexity: O(K + B), where K is the in-memory store size and B is the medium size.

Hints

  1. Use a self-describing frame: fixed magic bytes, a fixed-width length field, and a checksum before the payload.
  2. During restore, keep track of the last valid snapshot you parsed so that a truncated final write does not destroy previously persisted state.
Last updated: Apr 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • Careers
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Simulate Infection Spread on a Grid - OpenAI (hard)
  • Implement Social Follow Recommendations - OpenAI (medium)
  • Build a Compose Rating Card - OpenAI (medium)
  • Convert IPv4 Ranges to CIDR Blocks - OpenAI (medium)
  • Implement Persistent KV Store Serialization - OpenAI (hard)