This question evaluates understanding of cache design and frequency-based eviction policies, efficient data structures that support O(1) operations, and techniques for approximate frequency estimation and summarization in distributed streaming environments.
You are asked to design and implement a data structure that behaves like an in-memory cache with a Least Frequently Used (LFU) eviction policy. Then, you will discuss how to extend the idea to a distributed, streaming setting.
Design a class LFUCache with the following API:
LFUCache(int capacity)
capacity
is a positive integer indicating the maximum number of key–value pairs the cache can hold.
int get(int key)
-1
.
void put(int key, int value)
capacity
, you must evict
one
key according to the LFU policy.
Eviction rules:
Requirements:
get
and
put
operations.
get
and
put
.
You may assume:
capacity >= 0
.
Now assume that instead of a single process, you are dealing with a high-volume data stream of keys (e.g., user IDs, item IDs, or search queries) spread across multiple machines. The input rate is too high to store all keys and their counts exactly on a single node.
You want to extend the LFU idea so that you can:
Questions for the distributed extension:
You do not need to draw full system architecture diagrams, but your answer should make it clear how the LFU concept is adapted to a distributed, streaming environment.