PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's understanding of persistent key-value storage, in-memory versus on-disk state management, serialization/deserialization, file I/O, and state reconstruction across multiple files, with additional considerations for concurrency and crash recovery.

  • medium
  • OpenAI
  • Coding & Algorithms
  • Software Engineer

Build a File-Backed Key-Value Store

Company: OpenAI

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Implement a persistent key-value store that supports the following operations for string keys and string values: - `put(key, value)` - `get(key)` - `delete(key)` ### Part 1 Keep data in memory during execution, but support persistence to disk with: - `serialize()` to write the current state to a single file - `deserialize()` to rebuild the store from that file after restart You may choose any reasonable on-disk format, as long as the data can be written and loaded correctly. ### Part 2 Now assume each file on disk has a maximum size limit. Extend the store so persistence is spread across multiple files: - when the current file reaches the size limit, continue writing to a new file - on startup, reconstruct the latest key-value state from all files ### Follow-up - How would you handle concurrent readers and writers safely? - What changes would you make to avoid corruption during crashes? - Write a few basic test cases for normal operations and recovery from disk.

Quick Answer: This question evaluates a candidate's understanding of persistent key-value storage, in-memory versus on-disk state management, serialization/deserialization, file I/O, and state reconstruction across multiple files, with additional considerations for concurrency and crash recovery.

Part 1: Single-File Persistent Key-Value Store

Design a string-to-string key-value store that supports three in-memory operations: put(key, value), get(key), and delete(key). Your program will run in two phases: 1. Execute all operations in before_restart_ops on an initially empty store. 2. Serialize the entire store to a single file, clear memory to simulate a restart, then deserialize from that file. 3. Execute all operations in after_restart_ops on the restored store. Return the result of every get operation from both phases, in order, and the final store contents sorted by key. You may use any correct single-file serialization format internally. The checker only validates that data is preserved correctly across the restart.

Constraints

  • 0 <= total number of operations across both phases <= 10^4
  • Keys and values are strings with length from 0 to 100
  • Operations are valid and use one of: put, get, delete
  • Final output must reflect the state after serialize -> restart -> deserialize

Examples

Input: ([('put', 'a', '1'), ('put', 'b', '2'), ('get', 'a')], [('get', 'b'), ('delete', 'a'), ('get', 'a')])

Expected Output: (['1', '2', None], [('b', '2')])

Explanation: The values for 'a' and 'b' survive the restart. After deleting 'a', a later get returns None.

Input: ([('put', '', 'x'), ('put', 'k', 'old'), ('put', 'k', 'new')], [('get', ''), ('get', 'k')])

Expected Output: (['x', 'new'], [('', 'x'), ('k', 'new')])

Explanation: Tests empty-string keys and overwriting an existing key before serialization.

Input: ([], [('get', 'missing'), ('delete', 'missing')])

Expected Output: ([None], [])

Explanation: Edge case: no data is stored before restart, and missing keys should return None.

Input: ([('put', 'x', '1'), ('delete', 'x')], [('get', 'x')])

Expected Output: ([None], [])

Explanation: A key deleted before serialization should not reappear after deserialization.

Hints

  1. Use a dictionary for the in-memory store so put, get, and delete are efficient.
  2. For persistence, any reversible representation works; serializing the entire dictionary to JSON is enough.

Part 2: Multi-File Persistent Key-Value Store with File Size Limits

Extend the persistent key-value store so that data is written across multiple files, each with a maximum size. Use the following exact on-disk log format for every mutating operation: - put(key, value) is stored as the record: 'P|key|value\n' - delete(key) is stored as the record: 'D|key\n' - get(key) is not written to disk A file may contain multiple records. When appending the next record would make the current file exceed max_file_size characters, start a new file and write the record there. Your program runs in two phases: 1. Execute before_restart_ops on an empty store, writing each put/delete to the log files. 2. Simulate a restart by rebuilding the in-memory store only by replaying all files from oldest to newest. 3. Execute after_restart_ops on the restored store, continuing to append new records to the existing files with the same size rule. Return all get results in order, the final file contents, and the final store contents sorted by key.

Constraints

  • 1 <= max_file_size <= 10^5
  • 0 <= total number of operations across both phases <= 10^4
  • Keys and values do not contain '|' or newline characters
  • Every individual log record fits within max_file_size
  • Replay order is exactly from the first file to the last file, and from top to bottom within each file

Examples

Input: (10, [('put', 'a', '1'), ('put', 'bb', '2'), ('get', 'a')], [('get', 'bb'), ('delete', 'a'), ('get', 'a'), ('put', 'c', '3')])

Expected Output: (['1', '2', None], ['P|a|1\n', 'P|bb|2\n', 'D|a\nP|c|3\n'], [('bb', '2'), ('c', '3')])

Explanation: The second put does not fit in the first file, so a new file is created. After restart, recovery replays both files before new operations continue.

Input: (8, [('put', 'aa', 'b'), ('put', 'x', 'y')], [('get', 'aa'), ('put', 'aa', 'c'), ('get', 'aa')])

Expected Output: (['b', 'c'], ['P|aa|b\n', 'P|x|y\n', 'P|aa|c\n'], [('aa', 'c'), ('x', 'y')])

Explanation: Overwriting a key after restart adds a new log record in a later file, and replaying all files yields the newest value.

Input: (10, [('put', 'a', '1'), ('delete', 'a')], [('get', 'a')])

Expected Output: ([None], ['P|a|1\nD|a\n'], [])

Explanation: Edge case: the delete record fits exactly into the remaining space of the first file, and recovery correctly removes the key.

Input: (12, [('put', 'a', '1')], [('put', 'b', '2'), ('get', 'a'), ('get', 'b')])

Expected Output: (['1', '2'], ['P|a|1\nP|b|2\n'], [('a', '1'), ('b', '2')])

Explanation: After restart, new records should keep using the last file if there is still space.

Input: (5, [], [('get', 'z')])

Expected Output: ([None], [], [])

Explanation: Edge case: no persisted records at all.

Hints

  1. Think of the disk as an append-only log: only put and delete need to be persisted.
  2. During recovery, replay every record from oldest to newest; later operations override earlier ones.
Last updated: Apr 19, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement a Distributed Rate Limiter - OpenAI (medium)
  • Compute Plant Infection Time - OpenAI (medium)
  • Implement IP Address Arithmetic - OpenAI (hard)
  • Simulate Infection Spread on a Grid - OpenAI (hard)
  • Implement Social Follow Recommendations - OpenAI (medium)