Implement Python LRU cache with args and persistence
Company: Anthropic
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates a candidate's ability to design and implement an LRU cache in Python that supports variable-length positional and keyword arguments with canonical, deterministic hashing for semantically equivalent calls and O(1) get/put via a hash map plus a doubly linked list; it falls under the Coding & Algorithms domain and primarily tests practical implementation skills, API robustness, hashing/equality subtleties, and algorithmic complexity guarantees. The persistence follow-up evaluates understanding of serialization and storage design — on-disk formats, versioning, and atomic write strategies — and compares trade-offs between binary pickle and JSON-based serialization in terms of security, compatibility, and performance, reflecting both conceptual design and practical system-level considerations.
Part 1: Canonical-Argument LRU Cache
Constraints
- 0 <= capacity <= 10^4
- 0 <= len(calls) <= 2 * 10^4
- Each call is valid for the signature f(a, b=0, *extra, **kw)
- Argument trees contain only hashable atoms or nested lists, tuples, dicts, and sets; total nested elements across all calls is at most 2 * 10^5
Examples
Input: {'capacity': 2, 'calls': [{'args': [1], 'kwargs': {'b': 2, 'x': [3, 4]}}, {'args': [], 'kwargs': {'x': (3, 4), 'a': 1, 'b': 2}}, {'args': [5], 'kwargs': {}}, {'args': [6], 'kwargs': {}}, {'args': [1], 'kwargs': {'x': (3, 4), 'b': 2}}]}
Expected Output: {'results': [10, 10, 5, 6, 10], 'hit_miss': [False, True, False, False, False], 'exec_count': 4}
Explanation: The first two calls are semantically identical, so the second is a hit. Later, capacity 2 forces eviction, so the final repeated call is a miss.
Input: {'capacity': 0, 'calls': [{'args': [1], 'kwargs': {}}, {'args': [], 'kwargs': {'a': 1, 'b': 0}}]}
Expected Output: {'results': [1, 1], 'hit_miss': [False, False], 'exec_count': 2}
Explanation: Edge case: with capacity 0, nothing is stored, so every call misses even if the keys are equivalent.
Input: {'capacity': 1, 'calls': [{'args': [1], 'kwargs': {'meta': {'y': 2, 'x': 3}, 'tags': {5, 4}}}, {'args': [], 'kwargs': {'a': 1, 'tags': {4, 5}, 'meta': {'x': 3, 'y': 2}}}]}
Expected Output: {'results': [15, 15], 'hit_miss': [False, True], 'exec_count': 1}
Explanation: Nested dict order and set order should not change the cache key.
Input: {'capacity': 3, 'calls': []}
Expected Output: {'results': [], 'hit_miss': [], 'exec_count': 0}
Explanation: Edge case: no calls.
Solution
def solution(data):
capacity = data['capacity']
calls = data['calls']
class Node:
__slots__ = ('key', 'value', 'prev', 'next')
def __init__(self, key=None, value=None):
self.key = key
self.value = value
self.prev = None
self.next = None
def freeze(x):
if isinstance(x, dict):
items = [(freeze(k), freeze(v)) for k, v in x.items()]
items.sort(key=lambda pair: repr(pair[0]))
return ('dict', tuple(items))
if isinstance(x, set):
items = [freeze(v) for v in x]
items.sort(key=repr)
return ('set', tuple(items))
if isinstance(x, (list, tuple)):
return ('seq', tuple(freeze(v) for v in x))
return ('atom', x)
def sum_ints(x):
if type(x) is int:
return x
if isinstance(x, dict):
return sum(sum_ints(v) for v in x.values())
if isinstance(x, (list, tuple, set)):
return sum(sum_ints(v) for v in x)
return 0
def bind(call):
args = list(call.get('args', []))
kwargs = dict(call.get('kwargs', {}))
if args:
a = args[0]
kwargs.pop('a', None)
else:
a = kwargs.pop('a')
if len(args) >= 2:
b = args[1]
kwargs.pop('b', None)
else:
b = kwargs.pop('b', 0)
extra = tuple(args[2:]) if len(args) > 2 else ()
return a, b, extra, kwargs
head = Node()
tail = Node()
head.next = tail
tail.prev = head
cache = {}
def remove(node):
node.prev.next = node.next
node.next.prev = node.prev
def append_mru(node):
prev = tail.prev
prev.next = node
node.prev = prev
node.next = tail
tail.prev = node
results = []
hit_miss = []
exec_count = 0
for call in calls:
a, b, extra, kw = bind(call)
key = (freeze(a), freeze(b), freeze(extra), freeze(kw))
node = cache.get(key)
if node is not None:
remove(node)
append_mru(node)
hit_miss.append(True)
results.append(node.value)
continue
value = sum_ints(a) + sum_ints(b) + sum_ints(extra) + sum_ints(kw)
exec_count += 1
hit_miss.append(False)
results.append(value)
if capacity == 0:
continue
node = Node(key, value)
cache[key] = node
append_mru(node)
if len(cache) > capacity:
lru = head.next
remove(lru)
del cache[lru.key]
return {'results': results, 'hit_miss': hit_miss, 'exec_count': exec_count}
Time complexity: Per call: O(k) for binding/canonicalization/scoring of a call with k nested atomic elements, plus O(1) average-time LRU cache updates.. Space complexity: O(capacity * average_key_size) for the cache, excluding the input..
Hints
- First bind every call to the canonical form (a, b, extra, kw) and apply the default b=0 before building a cache key.
- Recursively convert unhashable objects into immutable tagged tuples so that dict key order and set order stop mattering.
Part 2: Persistent LRU Cache with Versioned JSON Snapshots
Constraints
- 0 <= capacity <= 10^4
- 0 <= len(ops) <= 2 * 10^4
- Keys used by put/get are strings
- Values stored by put are JSON-compatible (null/boolean/number/string/list/object)
- If restore receives an invalid snapshot, the cache state must remain unchanged
Examples
Input: {'capacity': 2, 'ops': [['put', 'a', 1], ['put', 'b', 2], ['get', 'a'], ['snapshot'], ['put', 'c', 3], ['restore', '{"capacity":2,"items":[{"key":"b","value":2},{"key":"a","value":1}],"version":1}'], ['get', 'b']]}
Expected Output: {'outputs': [None, None, {'found': True, 'value': 1}, '{"capacity":2,"items":[{"key":"b","value":2},{"key":"a","value":1}],"version":1}', None, True, {'found': True, 'value': 2}]}
Explanation: After get('a'), the LRU order becomes b then a, so the snapshot stores items in that order. Restoring brings b back after it was evicted by putting c.
Input: {'capacity': 1, 'ops': [['put', 'x', {'n': 1}], ['snapshot'], ['restore', '{"version":2,"capacity":1,"items":[]}'], ['get', 'x'], ['restore', 'not json'], ['get', 'y']]}
Expected Output: {'outputs': [None, '{"capacity":1,"items":[{"key":"x","value":{"n":1}}],"version":1}', False, {'found': True, 'value': {'n': 1}}, False, {'found': False}]}
Explanation: Unsupported version and malformed JSON must both fail without changing the existing cache.
Input: {'capacity': 2, 'ops': [['put', 'k', 1], ['snapshot'], ['put', 'm', 2], ['put', 'k', 9], ['snapshot'], ['restore', '{"capacity":2,"items":[{"key":"k","value":1}],"version":1}'], ['get', 'm'], ['get', 'k']]}
Expected Output: {'outputs': [None, '{"capacity":2,"items":[{"key":"k","value":1}],"version":1}', None, None, '{"capacity":2,"items":[{"key":"m","value":2},{"key":"k","value":9}],"version":1}', True, {'found': False}, {'found': True, 'value': 1}]}
Explanation: Updating k changes both its value and recency. Restoring the older snapshot brings back the older single-item state.
Input: {'capacity': 0, 'ops': [['put', 'a', 1], ['snapshot'], ['restore', '{"capacity":0,"items":[],"version":1}'], ['get', 'a']]}
Expected Output: {'outputs': [None, '{"capacity":0,"items":[],"version":1}', True, {'found': False}]}
Explanation: Edge case: capacity 0 means no entries are retained, but empty snapshots are still valid.
Solution
def solution(data):
import json
class Node:
__slots__ = ('key', 'value', 'prev', 'next')
def __init__(self, key=None, value=None):
self.key = key
self.value = value
self.prev = None
self.next = None
class LRU:
def __init__(self, capacity):
self.capacity = capacity
self._reset()
def _reset(self):
self.map = {}
self.head = Node()
self.tail = Node()
self.head.next = self.tail
self.tail.prev = self.head
def _remove(self, node):
node.prev.next = node.next
node.next.prev = node.prev
def _append(self, node):
prev = self.tail.prev
prev.next = node
node.prev = prev
node.next = self.tail
self.tail.prev = node
def get(self, key):
node = self.map.get(key)
if node is None:
return {'found': False}
self._remove(node)
self._append(node)
return {'found': True, 'value': node.value}
def put(self, key, value):
node = self.map.get(key)
if node is not None:
node.value = value
self._remove(node)
self._append(node)
return
if self.capacity == 0:
return
node = Node(key, value)
self.map[key] = node
self._append(node)
if len(self.map) > self.capacity:
lru = self.head.next
self._remove(lru)
del self.map[lru.key]
def snapshot(self):
items = []
cur = self.head.next
while cur is not self.tail:
items.append({'key': cur.key, 'value': cur.value})
cur = cur.next
payload = {'version': 1, 'capacity': self.capacity, 'items': items}
return json.dumps(payload, sort_keys=True, separators=(',', ':'))
def restore(self, text):
try:
payload = json.loads(text)
except Exception:
return False
if not isinstance(payload, dict):
return False
if set(payload.keys()) != {'version', 'capacity', 'items'}:
return False
version = payload.get('version')
cap = payload.get('capacity')
items = payload.get('items')
if type(version) is not int or version != 1:
return False
if type(cap) is not int or cap < 0 or not isinstance(items, list):
return False
if len(items) > cap:
return False
parsed = []
seen = set()
for item in items:
if not isinstance(item, dict) or set(item.keys()) != {'key', 'value'}:
return False
key = item['key']
if not isinstance(key, str) or key in seen:
return False
seen.add(key)
parsed.append((key, item['value']))
self.capacity = cap
self._reset()
for key, value in parsed:
node = Node(key, value)
self.map[key] = node
self._append(node)
return True
cache = LRU(data['capacity'])
outputs = []
for op in data['ops']:
kind = op[0]
if kind == 'put':
cache.put(op[1], op[2])
outputs.append(None)
elif kind == 'get':
outputs.append(cache.get(op[1]))
elif kind == 'snapshot':
outputs.append(cache.snapshot())
elif kind == 'restore':
outputs.append(cache.restore(op[1]))
else:
raise ValueError('Unknown operation')
return {'outputs': outputs}
Time complexity: get/put are O(1) average. snapshot and restore are O(n + s), where n is the number of cached items and s is the snapshot string size.. Space complexity: O(capacity + s), where s is the size of the largest snapshot string being processed..
Hints
- Store cache items in LRU order so snapshot can be emitted directly from least recent to most recent.
- Validate the entire snapshot before mutating the cache. In production, pair this with temp-file plus rename for atomic persistence.