Implement Python LRU cache with args and persistence
Company: Anthropic
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates a candidate's ability to design and implement an LRU cache in Python that supports variable-length positional and keyword arguments with canonical, deterministic hashing for semantically equivalent calls and O(1) get/put via a hash map plus a doubly linked list; it falls under the Coding & Algorithms domain and primarily tests practical implementation skills, API robustness, hashing/equality subtleties, and algorithmic complexity guarantees. The persistence follow-up evaluates understanding of serialization and storage design — on-disk formats, versioning, and atomic write strategies — and compares trade-offs between binary pickle and JSON-based serialization in terms of security, compatibility, and performance, reflecting both conceptual design and practical system-level considerations.
Part 1: Canonical-Argument LRU Cache
Constraints
- 0 <= capacity <= 10^4
- 0 <= len(calls) <= 2 * 10^4
- Each call is valid for the signature f(a, b=0, *extra, **kw)
- Argument trees contain only hashable atoms or nested lists, tuples, dicts, and sets; total nested elements across all calls is at most 2 * 10^5
Examples
Input: {'capacity': 2, 'calls': [{'args': [1], 'kwargs': {'b': 2, 'x': [3, 4]}}, {'args': [], 'kwargs': {'x': (3, 4), 'a': 1, 'b': 2}}, {'args': [5], 'kwargs': {}}, {'args': [6], 'kwargs': {}}, {'args': [1], 'kwargs': {'x': (3, 4), 'b': 2}}]}
Expected Output: {'results': [10, 10, 5, 6, 10], 'hit_miss': [False, True, False, False, False], 'exec_count': 4}
Explanation: The first two calls are semantically identical, so the second is a hit. Later, capacity 2 forces eviction, so the final repeated call is a miss.
Input: {'capacity': 0, 'calls': [{'args': [1], 'kwargs': {}}, {'args': [], 'kwargs': {'a': 1, 'b': 0}}]}
Expected Output: {'results': [1, 1], 'hit_miss': [False, False], 'exec_count': 2}
Explanation: Edge case: with capacity 0, nothing is stored, so every call misses even if the keys are equivalent.
Input: {'capacity': 1, 'calls': [{'args': [1], 'kwargs': {'meta': {'y': 2, 'x': 3}, 'tags': {5, 4}}}, {'args': [], 'kwargs': {'a': 1, 'tags': {4, 5}, 'meta': {'x': 3, 'y': 2}}}]}
Expected Output: {'results': [15, 15], 'hit_miss': [False, True], 'exec_count': 1}
Explanation: Nested dict order and set order should not change the cache key.
Input: {'capacity': 3, 'calls': []}
Expected Output: {'results': [], 'hit_miss': [], 'exec_count': 0}
Explanation: Edge case: no calls.
Hints
- First bind every call to the canonical form (a, b, extra, kw) and apply the default b=0 before building a cache key.
- Recursively convert unhashable objects into immutable tagged tuples so that dict key order and set order stop mattering.
Part 2: Persistent LRU Cache with Versioned JSON Snapshots
Constraints
- 0 <= capacity <= 10^4
- 0 <= len(ops) <= 2 * 10^4
- Keys used by put/get are strings
- Values stored by put are JSON-compatible (null/boolean/number/string/list/object)
- If restore receives an invalid snapshot, the cache state must remain unchanged
Examples
Input: {'capacity': 2, 'ops': [['put', 'a', 1], ['put', 'b', 2], ['get', 'a'], ['snapshot'], ['put', 'c', 3], ['restore', '{"capacity":2,"items":[{"key":"b","value":2},{"key":"a","value":1}],"version":1}'], ['get', 'b']]}
Expected Output: {'outputs': [None, None, {'found': True, 'value': 1}, '{"capacity":2,"items":[{"key":"b","value":2},{"key":"a","value":1}],"version":1}', None, True, {'found': True, 'value': 2}]}
Explanation: After get('a'), the LRU order becomes b then a, so the snapshot stores items in that order. Restoring brings b back after it was evicted by putting c.
Input: {'capacity': 1, 'ops': [['put', 'x', {'n': 1}], ['snapshot'], ['restore', '{"version":2,"capacity":1,"items":[]}'], ['get', 'x'], ['restore', 'not json'], ['get', 'y']]}
Expected Output: {'outputs': [None, '{"capacity":1,"items":[{"key":"x","value":{"n":1}}],"version":1}', False, {'found': True, 'value': {'n': 1}}, False, {'found': False}]}
Explanation: Unsupported version and malformed JSON must both fail without changing the existing cache.
Input: {'capacity': 2, 'ops': [['put', 'k', 1], ['snapshot'], ['put', 'm', 2], ['put', 'k', 9], ['snapshot'], ['restore', '{"capacity":2,"items":[{"key":"k","value":1}],"version":1}'], ['get', 'm'], ['get', 'k']]}
Expected Output: {'outputs': [None, '{"capacity":2,"items":[{"key":"k","value":1}],"version":1}', None, None, '{"capacity":2,"items":[{"key":"m","value":2},{"key":"k","value":9}],"version":1}', True, {'found': False}, {'found': True, 'value': 1}]}
Explanation: Updating k changes both its value and recency. Restoring the older snapshot brings back the older single-item state.
Input: {'capacity': 0, 'ops': [['put', 'a', 1], ['snapshot'], ['restore', '{"capacity":0,"items":[],"version":1}'], ['get', 'a']]}
Expected Output: {'outputs': [None, '{"capacity":0,"items":[],"version":1}', True, {'found': False}]}
Explanation: Edge case: capacity 0 means no entries are retained, but empty snapshots are still valid.
Hints
- Store cache items in LRU order so snapshot can be emitted directly from least recent to most recent.
- Validate the entire snapshot before mutating the cache. In production, pair this with temp-file plus rename for atomic persistence.