Top-K in static and streaming settings (3 variants)
You are given integers and asked to answer Top-K queries under three different scenarios.
Variant 1: Static Top-K
Given an array A of n integers and an integer k, return the k most frequent values.
-
Input:
A
(length
n
),
k
-
Output: list of
k
values (any order is acceptable unless you specify tie-breaking)
-
Clarify tie-breaking for equal frequencies (e.g., smaller value first, or any order).
Variant 2: Real-time (unbounded) streaming Top-K
You receive a stream of integers supporting operations:
-
add(x)
: add one occurrence of value
x
-
topk()
: return the current
k
most frequent values
Design an efficient data structure and implement these operations.
Variant 3: Real-time Top-K within the last 1 hour (sliding window)
Events arrive as (timestamp, value) where timestamp is in seconds (monotonic non-decreasing). Support:
-
add(t, x)
: record event
x
at time
t
-
topk(t)
: return the
k
most frequent values considering only events with timestamps in
[t-3600, t]
Constraints to assume (unless you choose others)
-
n
up to 2e5 for the static problem.
-
Streaming operations up to 2e5.
-
Values may repeat heavily.
What to discuss
-
Time/space complexity tradeoffs for each variant.
-
How you maintain counts and retrieve Top-K efficiently, especially for the sliding window case.
-
Edge cases: fewer than
k
distinct values, ties, many unique values, bursts of events.