Implement map serialization and deserialization
Company: OpenAI
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates understanding of serialization, string encoding and parsing, data structure invariants (invertibility), and efficiency when representing maps with arbitrary string keys and values.
Constraints
- 0 <= number of key-value pairs <= 50000
- The total number of characters across all keys and values is at most 10^6
- Keys and values are arbitrary strings and may contain delimiter-like characters, including '#'
- Keys and values do not contain the null character '\0'
- For deserialize, the input string is expected to follow the specified format; malformed input may raise ValueError
Examples
Input: ('serialize', {})
Expected Output: '0#'
Explanation: An empty map has 0 pairs, so the entire encoding is just the pair count followed by '#'.
Input: ('deserialize', '0#')
Expected Output: {}
Explanation: The pair count is 0, so the reconstructed dictionary is empty.
Input: ('serialize', {'': '', 'a:b|c': 'x,y=z', 'k#1': 'v#2'})
Expected Output: '3#0#0#5#a:b|c5#x,y=z3#k#13#v#2'
Explanation: The keys are serialized in sorted order: '', 'a:b|c', 'k#1'. Each key and value is written as length + '#' + content.
Input: ('deserialize', '3#0#0#5#a:b|c5#x,y=z3#k#13#v#2')
Expected Output: {'': '', 'a:b|c': 'x,y=z', 'k#1': 'v#2'}
Explanation: Length prefixes allow correct parsing even though the strings contain punctuation and '#'.
Input: ('serialize', {'b': 'two words', 'a': '1', '': ' '})
Expected Output: '3#0#1# 1#a1#11#b9#two words'
Explanation: Sorted key order is '', 'a', 'b'. The single-space value is encoded as length 1, and 'two words' has length 9.
Input: ('deserialize', '3#0#1# 1#a1#11#b9#two words')
Expected Output: {'': ' ', 'a': '1', 'b': 'two words'}
Explanation: Empty keys, space-only values, and multi-word values are reconstructed exactly.
Solution
def solution(operation, data):
def serialize(m):
parts = [str(len(m)), '#']
for key in sorted(m.keys()):
value = m[key]
parts.append(str(len(key)))
parts.append('#')
parts.append(key)
parts.append(str(len(value)))
parts.append('#')
parts.append(value)
return ''.join(parts)
def read_number(s, i):
if i >= len(s):
raise ValueError('Unexpected end of input while reading length')
j = i
while j < len(s) and s[j] != '#':
if not s[j].isdigit():
raise ValueError('Invalid length field')
j += 1
if j == len(s) or j == i:
raise ValueError('Missing separator or empty length field')
return int(s[i:j]), j + 1
def deserialize(s):
count, i = read_number(s, 0)
result = {}
for _ in range(count):
key_len, i = read_number(s, i)
if i + key_len > len(s):
raise ValueError('Truncated key')
key = s[i:i + key_len]
i += key_len
value_len, i = read_number(s, i)
if i + value_len > len(s):
raise ValueError('Truncated value')
value = s[i:i + value_len]
i += value_len
result[key] = value
if i != len(s):
raise ValueError('Extra trailing data')
return result
if operation == 'serialize':
return serialize(data)
if operation == 'deserialize':
return deserialize(data)
raise ValueError("operation must be 'serialize' or 'deserialize'")Time complexity: Serialize: O(n log n + L) because keys are sorted for deterministic output, where n is the number of pairs and L is the total number of characters across all keys and values. Deserialize: O(S), where S is the length of the serialized string.. Space complexity: Serialize: O(L + n) for the output plus the sorted key list. Deserialize: O(n + L) for the reconstructed map..
Hints
- If keys and values can contain any delimiter character, splitting on a separator is not enough.
- Prefix each key and value with its length so the parser always knows exactly how many characters to read next.