Design a temporal key-value store with historical reads
Company: Lyft
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates a candidate's grasp of data structures and algorithms needed to implement a temporal key-value store supporting timestamped set/get operations and efficient historical reads, including performance targets like O(log n) per operation.
Constraints
- 0 <= len(operations) <= 10^6
- There may be up to 10^5 distinct keys
- Timestamps are integers and may arrive in any order
- If the same (key, timestamp) is set more than once, the newest value overwrites the previous one
Examples
Input: [('set', 'foo', 'bar', 1), ('get', 'foo', 1), ('get', 'foo', 3), ('set', 'foo', 'bar2', 4), ('get', 'foo', 4), ('get', 'foo', 5)]
Expected Output: ['bar', 'bar', 'bar2', 'bar2']
Explanation: At time 1 and 3, the latest value is 'bar'. After setting timestamp 4, queries at 4 and 5 return 'bar2'.
Input: []
Expected Output: []
Explanation: No operations means there are no get results to return.
Input: [('set', 'a', 'x', 5), ('set', 'a', 'y', 2), ('get', 'a', 1), ('get', 'a', 2), ('get', 'a', 4), ('get', 'a', 5), ('get', 'a', 100)]
Expected Output: ['', 'y', 'y', 'x', 'x']
Explanation: The timestamps were inserted out of order. Historical lookups must still return the nearest timestamp not greater than the query time.
Input: [('set', 'k1', 'v1', 10), ('set', 'k2', 'a', 3), ('set', 'k1', 'v2', 10), ('get', 'k1', 10), ('get', 'k2', 2), ('get', 'k2', 3), ('get', 'k1', 9)]
Expected Output: ['v2', '', 'a', '']
Explanation: Setting ('k1', 10) twice overwrites the old value. Queries before the first timestamp for a key return an empty string.
Input: [('set', 'user1', 'on', 7), ('set', 'user1', 'off', 12), ('set', 'user2', 'red', 5), ('set', 'user1', 'idle', 9), ('get', 'user1', 10), ('get', 'user1', 8), ('get', 'user2', 6), ('get', 'user3', 1)]
Expected Output: ['idle', 'on', 'red', '']
Explanation: For user1, the latest value at time 10 is from timestamp 9, and at time 8 it is from timestamp 7. Missing keys return an empty string.
Solution
def solution(operations):
class Node:
__slots__ = ('ts', 'value', 'prio', 'left', 'right')
def __init__(self, ts, value, prio):
self.ts = ts
self.value = value
self.prio = prio
self.left = None
self.right = None
seed = 2463534242
def next_rand():
nonlocal seed
seed ^= (seed << 13) & 0xFFFFFFFF
seed ^= (seed >> 17)
seed ^= (seed << 5) & 0xFFFFFFFF
return seed & 0xFFFFFFFF
def rotate_left(root):
new_root = root.right
root.right = new_root.left
new_root.left = root
return new_root
def rotate_right(root):
new_root = root.left
root.left = new_root.right
new_root.right = root
return new_root
def insert(root, ts, value):
if root is None:
return Node(ts, value, next_rand())
if ts == root.ts:
root.value = value
return root
if ts < root.ts:
root.left = insert(root.left, ts, value)
if root.left.prio > root.prio:
root = rotate_right(root)
else:
root.right = insert(root.right, ts, value)
if root.right.prio > root.prio:
root = rotate_left(root)
return root
def floor_value(root, ts):
ans = ''
while root is not None:
if ts < root.ts:
root = root.left
else:
ans = root.value
if ts == root.ts:
break
root = root.right
return ans
histories = {}
result = []
for op in operations:
if not op:
continue
if op[0] == 'set':
_, key, value, timestamp = op
histories[key] = insert(histories.get(key), timestamp, value)
elif op[0] == 'get':
_, key, timestamp = op
result.append(floor_value(histories.get(key), timestamp))
else:
raise ValueError('Unknown operation: %r' % (op[0],))
return resultTime complexity: Expected O(log m) per set/get operation, where m is the number of stored timestamps for that key. Space complexity: O(v), where v is the number of distinct (key, timestamp) pairs stored.
Hints
- Each get operation is a predecessor query: find the largest stored timestamp <= the requested timestamp for that key.
- A hash map gets you to the correct key quickly, but you still need an ordered structure per key. If updates are out of order, think about a balanced BST or treap rather than re-sorting on every insert.