How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Amazon during technical interviews.

Design LFU cache with distributed extension

Quick Overview

This question evaluates understanding of cache design and frequency-based eviction policies, efficient data structures that support O(1) operations, and techniques for approximate frequency estimation and summarization in distributed streaming environments.

Design LFU cache with distributed extension

Company: Amazon

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

### Problem You are asked to design and implement a data structure that behaves like an in-memory cache with a **Least Frequently Used (LFU)** eviction policy. Then, you will discuss how to extend the idea to a distributed, streaming setting. #### Part A: Single-node LFU cache Design a class `LFUCache` with the following API: - Constructor: `LFUCache(int capacity)` - `capacity` is a positive integer indicating the maximum number of key–value pairs the cache can hold. - Method: `int get(int key)` - If the key exists in the cache, return its value and update its usage frequency. - If the key does not exist, return `-1`. - Method: `void put(int key, int value)` - Insert or update the value of the key. - If inserting a new key would exceed `capacity`, you must evict **one** key according to the LFU policy. Eviction rules: 1. Evict the key with the **lowest access frequency**. 2. If multiple keys share the same (lowest) frequency, evict the one that is **least recently used** among them. **Requirements:** - Aim for **O(1)** amortized time for both `get` and `put` operations. - Clearly describe the data structures you choose and how they support: - Updating a key’s frequency on `get` and `put`. - Finding and evicting the correct key in O(1). You may assume: - `capacity >= 0`. - Keys and values are integers that fit in standard 32-bit types. #### Part B: Distributed / streaming extension Now assume that instead of a single process, you are dealing with a **high-volume data stream** of keys (e.g., user IDs, item IDs, or search queries) spread across multiple machines. The input rate is too high to store all keys and their counts exactly on a single node. You want to extend the LFU idea so that you can: - Continuously ingest a (potentially unbounded) stream of keys across **many machines**. - Approximate the set of **most frequently used keys** (LFU-like behavior) over a recent time window or over all time. - Support queries such as: "What are the current top *K* most frequent keys?" with reasonable accuracy. **Questions for the distributed extension:** 1. What data structures or algorithms would you use on each machine to track local frequencies under memory constraints (e.g., sketches, approximate counting, bounded-size summaries)? 2. How would you periodically **merge** local summaries to obtain a global view of the most frequent keys? 3. How would you handle: - Late or out-of-order events in the stream? - Sliding windows or time-based decay (so that recent events matter more)? 4. What trade-offs do you make between **accuracy**, **memory usage**, and **latency**, and why? You do **not** need to draw full system architecture diagrams, but your answer should make it clear how the LFU concept is adapted to a distributed, streaming environment.

Quick Answer: This question evaluates understanding of cache design and frequency-based eviction policies, efficient data structures that support O(1) operations, and techniques for approximate frequency estimation and summarization in distributed streaming environments.

Design and implement a data structure for a **Least Frequently Used (LFU)** cache. Implement the `LFUCache` class: - `LFUCache(int capacity)` — initialize the object with the maximum capacity of the cache. - `int get(int key)` — return the value of `key` if it exists in the cache, otherwise return `-1`. Accessing a key increases its usage frequency. - `void put(int key, int value)` — update the value of `key` if present, otherwise insert it. When the cache reaches capacity and a new key is inserted, evict the **least frequently used** key. If multiple keys share the lowest frequency, evict the **least recently used** among them. Both `get` and `put` must run in **O(1)** average time. **Harness encoding.** To make this runnable, your function is called as `lfuCache(capacity, operations)`: - `capacity` is the cache capacity (an integer, `>= 0`). - `operations` is a list of operations. Each is either `["put", key, value]` or `["get", key]`. - Return a list with one entry per operation, in order: `None` for each `put`, and the returned value (the value, or `-1`) for each `get`. For example, with `capacity = 2` and ops `[["put",1,1],["put",2,2],["get",1],["put",3,3],["get",2]]`, after inserting keys 1 and 2 and accessing key 1 (raising its frequency), inserting key 3 evicts key 2 (lowest frequency), so `get(2)` returns `-1`. The result is `[None, None, 1, None, -1]`. --- **Follow-up (system design discussion — not graded by the console):** Extend the LFU idea to a high-volume, distributed data stream where exact counts cannot fit on one machine. Consider per-node approximate-counting summaries (e.g. Count-Min Sketch + a bounded top-K heap, or the Space-Saving / Misra–Gries algorithm), periodic mergeable summaries to compute a global top-K, handling late/out-of-order events, sliding windows with time-decay, and the accuracy/memory/latency trade-offs involved.

Constraints

0 <= capacity
Keys and values fit in 32-bit signed integers.
Each operation is either ["put", key, value] or ["get", key].
On eviction, remove the least frequently used key; break ties by least recently used.
If capacity == 0, every put is a no-op and every get returns -1.

Examples

Input: (2, [["put", 1, 1], ["put", 2, 2], ["get", 1], ["put", 3, 3], ["get", 2], ["get", 3], ["put", 4, 4], ["get", 1], ["get", 3], ["get", 4]])

Expected Output: [None, None, 1, None, -1, 3, None, -1, 3, 4]

Explanation: Classic LeetCode trace. After get(1), key 1 has freq 2 and key 2 has freq 1, so put(3) evicts key 2. Later both keys 1 and 3 have freq 2; key 1 was accessed less recently, so put(4) evicts key 1.

Input: (0, [["put", 0, 0], ["get", 0]])

Expected Output: [None, -1]

Explanation: Zero capacity: the put stores nothing and the get returns -1.

Input: (1, [["put", 1, 1], ["get", 1], ["put", 2, 2], ["get", 1], ["get", 2]])

Expected Output: [None, 1, None, -1, 2]

Explanation: Capacity 1: inserting key 2 evicts key 1 (the only candidate) regardless of key 1's higher frequency, since the cache can hold only one entry.

Input: (2, [["put", 1, 1], ["put", 2, 2], ["get", 1], ["get", 1], ["get", 2], ["put", 3, 3], ["get", 1], ["get", 2], ["get", 3]])

Expected Output: [None, None, 1, 1, 2, None, 1, -1, 3]

Explanation: Key 1 reaches freq 3, key 2 reaches freq 2. put(3) evicts the lowest-frequency key, which is key 2.

Input: (3, [["put", 1, 10], ["put", 1, 20], ["get", 1], ["put", 2, 2], ["put", 3, 3], ["put", 4, 4], ["get", 2], ["get", 3], ["get", 4], ["get", 1]])

Expected Output: [None, None, 20, None, None, None, -1, 3, 4, 20]

Explanation: put on an existing key updates its value (10 -> 20) and bumps its frequency. Key 1 ends at freq 3; keys 2,3 at freq 1. The full-cache put(4) evicts the least-recently-used among the freq-1 keys, which is key 2.

Solution

def lfuCache(capacity, operations):
    from collections import defaultdict, OrderedDict

    class LFUCache:
        def __init__(self, capacity):
            self.capacity = capacity
            self.key_to_val = {}
            self.key_to_freq = {}
            self.freq_to_keys = defaultdict(OrderedDict)
            self.min_freq = 0

        def _touch(self, key):
            freq = self.key_to_freq[key]
            del self.freq_to_keys[freq][key]
            if not self.freq_to_keys[freq]:
                del self.freq_to_keys[freq]
                if self.min_freq == freq:
                    self.min_freq += 1
            self.key_to_freq[key] = freq + 1
            self.freq_to_keys[freq + 1][key] = None

        def get(self, key):
            if key not in self.key_to_val:
                return -1
            self._touch(key)
            return self.key_to_val[key]

        def put(self, key, value):
            if self.capacity <= 0:
                return
            if key in self.key_to_val:
                self.key_to_val[key] = value
                self._touch(key)
                return
            if len(self.key_to_val) >= self.capacity:
                evict_key, _ = self.freq_to_keys[self.min_freq].popitem(last=False)
                del self.key_to_val[evict_key]
                del self.key_to_freq[evict_key]
            self.key_to_val[key] = value
            self.key_to_freq[key] = 1
            self.freq_to_keys[1][key] = None
            self.min_freq = 1

    cache = LFUCache(capacity)
    result = []
    for op in operations:
        name = op[0]
        if name == "get":
            result.append(cache.get(op[1]))
        elif name == "put":
            cache.put(op[1], op[2])
            result.append(None)
    return result

Time complexity: O(1) average per get/put operation; O(M) total for M operations.. Space complexity: O(capacity) for the stored keys, values, frequency map, and per-frequency ordered key lists..

Hints

Maintain three maps: key -> value, key -> frequency, and frequency -> ordered collection of keys at that frequency (insertion-ordered so the front is the least recently used).
Track a running min_freq. When you bump a key's frequency, remove it from its old frequency bucket; if that bucket was the min_freq bucket and becomes empty, increment min_freq.
On insertion into a full cache, evict the front (oldest) key of the min_freq bucket, then set min_freq back to 1 for the newly inserted key.
An OrderedDict (or a doubly-linked list per frequency) gives O(1) insertion, removal, and oldest-key eviction, which is what keeps both operations O(1).

Quick Overview

Design LFU cache with distributed extension

Company: Amazon

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

Constraints

0 <= capacity
Keys and values fit in 32-bit signed integers.
Each operation is either ["put", key, value] or ["get", key].
On eviction, remove the least frequently used key; break ties by least recently used.
If capacity == 0, every put is a no-op and every get returns -1.

Examples

Input: (2, [["put", 1, 1], ["put", 2, 2], ["get", 1], ["put", 3, 3], ["get", 2], ["get", 3], ["put", 4, 4], ["get", 1], ["get", 3], ["get", 4]])

Expected Output: [None, None, 1, None, -1, 3, None, -1, 3, 4]

Input: (0, [["put", 0, 0], ["get", 0]])

Expected Output: [None, -1]

Explanation: Zero capacity: the put stores nothing and the get returns -1.

Input: (1, [["put", 1, 1], ["get", 1], ["put", 2, 2], ["get", 1], ["get", 2]])

Expected Output: [None, 1, None, -1, 2]

Explanation: Capacity 1: inserting key 2 evicts key 1 (the only candidate) regardless of key 1's higher frequency, since the cache can hold only one entry.

Input: (2, [["put", 1, 1], ["put", 2, 2], ["get", 1], ["get", 1], ["get", 2], ["put", 3, 3], ["get", 1], ["get", 2], ["get", 3]])

Expected Output: [None, None, 1, 1, 2, None, 1, -1, 3]

Explanation: Key 1 reaches freq 3, key 2 reaches freq 2. put(3) evicts the lowest-frequency key, which is key 2.

Input: (3, [["put", 1, 10], ["put", 1, 20], ["get", 1], ["put", 2, 2], ["put", 3, 3], ["put", 4, 4], ["get", 2], ["get", 3], ["get", 4], ["get", 1]])

Expected Output: [None, None, 20, None, None, None, -1, 3, 4, 20]

Solution

def lfuCache(capacity, operations):
    from collections import defaultdict, OrderedDict

    class LFUCache:
        def __init__(self, capacity):
            self.capacity = capacity
            self.key_to_val = {}
            self.key_to_freq = {}
            self.freq_to_keys = defaultdict(OrderedDict)
            self.min_freq = 0

        def _touch(self, key):
            freq = self.key_to_freq[key]
            del self.freq_to_keys[freq][key]
            if not self.freq_to_keys[freq]:
                del self.freq_to_keys[freq]
                if self.min_freq == freq:
                    self.min_freq += 1
            self.key_to_freq[key] = freq + 1
            self.freq_to_keys[freq + 1][key] = None

        def get(self, key):
            if key not in self.key_to_val:
                return -1
            self._touch(key)
            return self.key_to_val[key]

        def put(self, key, value):
            if self.capacity <= 0:
                return
            if key in self.key_to_val:
                self.key_to_val[key] = value
                self._touch(key)
                return
            if len(self.key_to_val) >= self.capacity:
                evict_key, _ = self.freq_to_keys[self.min_freq].popitem(last=False)
                del self.key_to_val[evict_key]
                del self.key_to_freq[evict_key]
            self.key_to_val[key] = value
            self.key_to_freq[key] = 1
            self.freq_to_keys[1][key] = None
            self.min_freq = 1

    cache = LFUCache(capacity)
    result = []
    for op in operations:
        name = op[0]
        if name == "get":
            result.append(cache.get(op[1]))
        elif name == "put":
            cache.put(op[1], op[2])
            result.append(None)
    return result

Time complexity: O(1) average per get/put operation; O(M) total for M operations.. Space complexity: O(capacity) for the stored keys, values, frequency map, and per-frequency ordered key lists..

Hints

Maintain three maps: key -> value, key -> frequency, and frequency -> ordered collection of keys at that frequency (insertion-ordered so the front is the least recently used).
Track a running min_freq. When you bump a key's frequency, remove it from its old frequency bucket; if that bucket was the min_freq bucket and becomes empty, increment min_freq.
On insertion into a full cache, evict the front (oldest) key of the min_freq bucket, then set min_freq back to 1 for the newly inserted key.
An OrderedDict (or a doubly-linked list per frequency) gives O(1) insertion, removal, and oldest-key eviction, which is what keeps both operations O(1).