PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates understanding of serialization, string encoding and parsing, data structure invariants (invertibility), and efficiency when representing maps with arbitrary string keys and values.

  • medium
  • OpenAI
  • Coding & Algorithms
  • Software Engineer

Implement map serialization and deserialization

Company: OpenAI

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

You are given an in-memory map (dictionary) from strings to strings. Implement two functions: - `string serialize(map<string, string> m)` - `map<string, string> deserialize(string data)` Requirements: - The serialization format must be **invertible**: deserializing the result of `serialize(m)` must always reconstruct the original map exactly. - Keys and values are arbitrary strings and may contain spaces, punctuation, and characters that you might otherwise want to use as delimiters (e.g., `:`, `,`, `;`, `|`, `=`). You may assume they do not contain the null character `"\0"`. - You **may not** use built-in serialization formats or libraries (e.g., no JSON, protobuf, or similar). You must design the encoding yourself. - Aim for an efficient solution in both time and space for maps with up to, say, tens of thousands of key–value pairs. Specify and implement: 1. The concrete string format you choose. 2. How `serialize` produces that format for a given map. 3. How `deserialize` parses that format back into the original map and handles edge cases (e.g., empty map, empty keys/values, special characters). 4. The time and space complexity of both functions.

Quick Answer: This question evaluates understanding of serialization, string encoding and parsing, data structure invariants (invertibility), and efficiency when representing maps with arbitrary string keys and values.

Design a custom, invertible encoding for a map from strings to strings. Because keys and values may contain any punctuation characters (including characters you might normally use as delimiters), a delimiter-only format is not safe. For this problem, use a length-prefixed format and implement both operations through one judge function: solution(operation, data). If operation == 'serialize', data is a dictionary {string: string} and you must return its serialized string. If operation == 'deserialize', data is a serialized string and you must return the original dictionary. Use this concrete format: <pair_count>#<key_length>#<key><value_length>#<value>... for every key-value pair, with pairs written in lexicographically sorted key order so the serialization is deterministic. Example: {'a': 'hi', 'b': ''} becomes '2#1#a2#hi1#b0#'. Empty maps, empty keys, empty values, spaces, and characters like ':', ',', ';', '|', '=', and '#' must all work correctly.

Constraints

  • 0 <= number of key-value pairs <= 50000
  • The total number of characters across all keys and values is at most 10^6
  • Keys and values are arbitrary strings and may contain delimiter-like characters, including '#'
  • Keys and values do not contain the null character '\0'
  • For deserialize, the input string is expected to follow the specified format; malformed input may raise ValueError

Examples

Input: ('serialize', {})

Expected Output: '0#'

Explanation: An empty map has 0 pairs, so the entire encoding is just the pair count followed by '#'.

Input: ('deserialize', '0#')

Expected Output: {}

Explanation: The pair count is 0, so the reconstructed dictionary is empty.

Input: ('serialize', {'': '', 'a:b|c': 'x,y=z', 'k#1': 'v#2'})

Expected Output: '3#0#0#5#a:b|c5#x,y=z3#k#13#v#2'

Explanation: The keys are serialized in sorted order: '', 'a:b|c', 'k#1'. Each key and value is written as length + '#' + content.

Input: ('deserialize', '3#0#0#5#a:b|c5#x,y=z3#k#13#v#2')

Expected Output: {'': '', 'a:b|c': 'x,y=z', 'k#1': 'v#2'}

Explanation: Length prefixes allow correct parsing even though the strings contain punctuation and '#'.

Input: ('serialize', {'b': 'two words', 'a': '1', '': ' '})

Expected Output: '3#0#1# 1#a1#11#b9#two words'

Explanation: Sorted key order is '', 'a', 'b'. The single-space value is encoded as length 1, and 'two words' has length 9.

Input: ('deserialize', '3#0#1# 1#a1#11#b9#two words')

Expected Output: {'': ' ', 'a': '1', 'b': 'two words'}

Explanation: Empty keys, space-only values, and multi-word values are reconstructed exactly.

Solution

def solution(operation, data):
    def serialize(m):
        parts = [str(len(m)), '#']
        for key in sorted(m.keys()):
            value = m[key]
            parts.append(str(len(key)))
            parts.append('#')
            parts.append(key)
            parts.append(str(len(value)))
            parts.append('#')
            parts.append(value)
        return ''.join(parts)

    def read_number(s, i):
        if i >= len(s):
            raise ValueError('Unexpected end of input while reading length')
        j = i
        while j < len(s) and s[j] != '#':
            if not s[j].isdigit():
                raise ValueError('Invalid length field')
            j += 1
        if j == len(s) or j == i:
            raise ValueError('Missing separator or empty length field')
        return int(s[i:j]), j + 1

    def deserialize(s):
        count, i = read_number(s, 0)
        result = {}
        for _ in range(count):
            key_len, i = read_number(s, i)
            if i + key_len > len(s):
                raise ValueError('Truncated key')
            key = s[i:i + key_len]
            i += key_len

            value_len, i = read_number(s, i)
            if i + value_len > len(s):
                raise ValueError('Truncated value')
            value = s[i:i + value_len]
            i += value_len

            result[key] = value

        if i != len(s):
            raise ValueError('Extra trailing data')
        return result

    if operation == 'serialize':
        return serialize(data)
    if operation == 'deserialize':
        return deserialize(data)
    raise ValueError("operation must be 'serialize' or 'deserialize'")

Time complexity: Serialize: O(n log n + L) because keys are sorted for deterministic output, where n is the number of pairs and L is the total number of characters across all keys and values. Deserialize: O(S), where S is the length of the serialized string.. Space complexity: Serialize: O(L + n) for the output plus the sorted key list. Deserialize: O(n + L) for the reconstructed map..

Hints

  1. If keys and values can contain any delimiter character, splitting on a separator is not enough.
  2. Prefix each key and value with its length so the parser always knows exactly how many characters to read next.
Last updated: May 1, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement IP Address Arithmetic - OpenAI (hard)
  • Simulate Infection Spread on a Grid - OpenAI (hard)
  • Implement Social Follow Recommendations - OpenAI (medium)
  • Build a Compose Rating Card - OpenAI (medium)
  • Generate Data Labeling Schedules - OpenAI (medium)