Implement KV store and plan type conversions
Company: OpenAI
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates proficiency in designing scalable, time-versioned key-value data structures and in graph-based shortest-path planning for costed type conversions, assessing algorithmic thinking, complexity analysis, and robustness under large-scale constraints in the Coding & Algorithms domain.
Part 1: Versioned Key-Value Store
Constraints
- 0 <= len(ops) <= 10^6
- All four input lists have the same length
- There can be up to 10^5 distinct keys
- Timestamps are integers in the range [0, 10^9]
- Stored values are non-empty strings and will never be the literal string 'None'
Examples
Input: (['set','set','get','set','get','get'], ['a','a','a','a','a','a'], ['x','y','','z','',''], [5,10,7,6,6,100])
Expected Output: ['x','z','y']
Explanation: The query at time 7 sees version (5,'x'). After setting timestamp 6 to 'z', the query at time 6 returns 'z'. The final query returns the latest version at or before 100, which is 'y' at timestamp 10.
Input: (['get','set','get','get'], ['m','m','m','m'], ['','v','',''], [4,5,4,5])
Expected Output: ['None','None','v']
Explanation: Before any set, the key is missing. After setting at timestamp 5, a query for time 4 still has no valid version, while time 5 returns 'v'.
Input: (['set','set','get','set','get','get'], ['a','a','a','b','b','a'], ['old','new','','x','',''], [3,3,3,2,2,10])
Expected Output: ['new','x','new']
Explanation: The second set for key 'a' uses the same timestamp and overwrites the old value. Key 'b' is independent.
Input: ([], [], [], [])
Expected Output: []
Explanation: Edge case: no operations means no output.
Solution
def solution(ops, keys, values, timestamps):
class Node:
__slots__ = ('ts', 'val', 'prio', 'left', 'right')
def __init__(self, ts, val, prio):
self.ts = ts
self.val = val
self.prio = prio
self.left = None
self.right = None
seed = 2463534242
def next_rand():
nonlocal seed
seed ^= (seed << 13) & 0xFFFFFFFF
seed ^= (seed >> 17)
seed ^= (seed << 5) & 0xFFFFFFFF
return seed & 0xFFFFFFFF
def rotate_right(root):
new_root = root.left
root.left = new_root.right
new_root.right = root
return new_root
def rotate_left(root):
new_root = root.right
root.right = new_root.left
new_root.left = root
return new_root
def insert(root, ts, val):
if root is None:
return Node(ts, val, next_rand())
if ts == root.ts:
root.val = val
return root
if ts < root.ts:
root.left = insert(root.left, ts, val)
if root.left.prio < root.prio:
root = rotate_right(root)
else:
root.right = insert(root.right, ts, val)
if root.right.prio < root.prio:
root = rotate_left(root)
return root
def get_le(root, ts):
ans = 'None'
while root is not None:
if root.ts <= ts:
ans = root.val
root = root.right
else:
root = root.left
return ans
roots = {}
out = []
for i, op in enumerate(ops):
key = keys[i]
ts = timestamps[i]
if op == 'set':
roots[key] = insert(roots.get(key), ts, values[i])
else:
out.append(get_le(roots.get(key), ts))
return outTime complexity: Expected O(log m) per operation, where m is the number of distinct timestamps stored for that key; O(n log m_max) expected total for n operations.. Space complexity: O(v), where v is the number of distinct (key, timestamp) versions stored..
Hints
- A hash map alone is not enough because each key needs fast predecessor queries by timestamp.
- Think of keeping one ordered structure per key, where insert and 'greatest timestamp <= t' can both be done in logarithmic time.
Part 2: Type Conversion Planning
Constraints
- 1 <= n <= 10^4
- 0 <= len(rules) <= 10^5
- 0 <= len(queries) <= 10^5
- 0 <= u, v, source, target < n
- 0 <= cost <= 10^9
Examples
Input: (5, [(0,1,2),(1,2,3),(0,2,10),(2,3,1)], [(0,2),(0,3),(3,0),(4,4)])
Expected Output: [5,6,-1,0]
Explanation: 0->2 is cheaper through 1. 0->3 is 0->1->2->3. There is no path from 3 to 0. A type always converts to itself at cost 0.
Input: (3, [(0,1,5),(0,1,2),(1,2,4),(0,2,10)], [(0,1),(0,2),(2,2)])
Expected Output: [2,6,0]
Explanation: There are duplicate rules from 0 to 1; the cheapest one should be used. Then 0->1->2 costs 6, which beats the direct edge of cost 10.
Input: (2, [], [(0,1),(1,1)])
Expected Output: [-1,0]
Explanation: Edge case: with no rules, only source == target queries have cost 0.
Input: (6, [(0,5,7),(1,5,3),(2,1,1),(2,5,20),(3,2,2)], [(0,5),(2,5),(3,5),(4,5)])
Expected Output: [7,4,6,-1]
Explanation: For target 5, 2 reaches 5 more cheaply through 1, and 3 reaches 5 through 2 and 1.
Solution
def solution(n, rules, queries):
from collections import defaultdict
from heapq import heappush, heappop
forward = [dict() for _ in range(n)]
reverse = [dict() for _ in range(n)]
for u, v, w in rules:
prev = forward[u].get(v)
if prev is None or w < prev:
forward[u][v] = w
reverse[v][u] = w
graph = [list(nei.items()) for nei in forward]
rev_graph = [list(nei.items()) for nei in reverse]
by_source = defaultdict(list)
by_target = defaultdict(list)
for idx, (s, t) in enumerate(queries):
by_source[s].append((t, idx))
by_target[t].append((s, idx))
def dijkstra(start, adj, needed_nodes):
dist = {start: 0}
heap = [(0, start)]
remaining = set(needed_nodes)
found = {}
if start in remaining:
found[start] = 0
remaining.remove(start)
if not remaining:
return found
INF = 10**30
while heap and remaining:
d, u = heappop(heap)
if d != dist.get(u):
continue
if u in remaining:
found[u] = d
remaining.remove(u)
if not remaining:
break
for v, w in adj[u]:
nd = d + w
if nd < dist.get(v, INF):
dist[v] = nd
heappush(heap, (nd, v))
return found
answers = [-1] * len(queries)
if len(by_source) <= len(by_target):
for s, items in by_source.items():
needed = {t for t, _ in items}
found = dijkstra(s, graph, needed)
for t, idx in items:
answers[idx] = found.get(t, -1)
else:
for t, items in by_target.items():
needed = {s for s, _ in items}
found = dijkstra(t, rev_graph, needed)
for s, idx in items:
answers[idx] = found.get(s, -1)
return answersTime complexity: Worst-case O(min(Us, Ut) * (E log V)), where Us is the number of distinct sources in queries and Ut is the number of distinct targets; early stopping often makes it faster in practice.. Space complexity: O(E + V + Q) for the graphs, query grouping, and answer array..
Hints
- If many queries share the same source, one shortest-path run can answer all of them; the same idea works with shared targets on the reversed graph.
- Because all costs are non-negative, Dijkstra's algorithm is appropriate, and you can stop early once all nodes needed for the current group of queries are settled.