How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Take-home Project rounds at Coinbase.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Coinbase during technical interviews.

Implement top-K over a stream | Coinbase Coding Question

Quick Overview

This question evaluates a candidate's proficiency in stream-processing algorithms and scalable data structures for frequency estimation and top-K queries, emphasizing analysis of time/space trade-offs, sliding-window semantics, and distributed aggregation.

Implement top-K over a stream

Company: Coinbase

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Take-home Project

Given a high-volume stream of events (e.g., account IDs from new account openings), design and implement a data structure that supports: ( 1) inserting an item, ( 2) returning the current top-K most frequent items, and ( 3) optionally returning the top-K over the last T minutes (sliding window). Discuss algorithm choices (heaps, bucket counting, count–min sketch with heap), time/space complexities, handling ties, changing K at query time, and window expiration. Outline a scalable distributed approach, including partitioning, partial aggregation, and merge semantics.

Quick Answer: This question evaluates a candidate's proficiency in stream-processing algorithms and scalable data structures for frequency estimation and top-K queries, emphasizing analysis of time/space trade-offs, sliding-window semantics, and distributed aggregation.

Design a data structure for a high-volume event stream (e.g. account IDs from new account openings) that supports two operations: 1. **insert(item)** — record one occurrence of `item`. 2. **topk(k)** — return the `k` most frequent items seen so far, ordered from most frequent to least frequent. Ties (items with equal frequency) are broken by item value in ascending (lexicographic) order so the result is deterministic. If fewer than `k` distinct items exist, return all of them. Implement the function `topKFrequentStream(operations)`. `operations` is a list of operations applied in order. Each operation is one of: - `["insert", item]` — insert `item` (a string). - `["topk", k]` — query for the current top `k` items. Return a list containing the result (a list of items) of every `topk` query, in the order the queries occurred. **Example** Input: `[["insert","x"],["insert","y"],["insert","x"],["insert","z"],["insert","y"],["insert","x"],["topk",2]]` x appears 3 times, y twice, z once, so the top 2 are `["x","y"]`. Output: `[["x","y"]]`. **Discussion (for the interview, not graded here):** A `Counter` (hash map) plus a partial sort gives O(N) inserts and O(D log D) per query (D = distinct items); a min-heap of size k gives O(D log k) per query and naturally supports changing k. For the sliding-window variant, bucket counts by time and expire old buckets, or maintain per-window counters. At scale, partition the stream by item hash, keep per-partition Count-Min sketch + heap of local top-K, then merge partial top-K results (merging is approximate unless you ship full counts).

Constraints

1 <= number of operations <= 10^5
Each insert item is a non-empty string.
0 <= k for a topk query; if k exceeds the number of distinct items, return all of them.
Frequency ties are broken by item value in ascending (lexicographic) order.

Examples

Input: ([['insert', 'a'], ['insert', 'b'], ['insert', 'a'], ['topk', 1]],)

Expected Output: [['a']]

Explanation: a has count 2, b has count 1; the single most frequent item is a.

Input: ([['insert', 'x'], ['insert', 'y'], ['insert', 'x'], ['insert', 'z'], ['insert', 'y'], ['insert', 'x'], ['topk', 2]],)

Expected Output: [['x', 'y']]

Explanation: Counts: x=3, y=2, z=1. Top 2 are x then y.

Input: ([['insert', 'a'], ['insert', 'b'], ['topk', 5]],)

Expected Output: [['a', 'b']]

Explanation: Only 2 distinct items exist, fewer than k=5, so both are returned; a and b tie at count 1 and are ordered ascending.

Input: ([['topk', 3]],)

Expected Output: [[]]

Explanation: No items have been inserted, so the top-K of an empty stream is an empty list.

Input: ([['insert', 'a'], ['insert', 'a'], ['topk', 1], ['insert', 'b'], ['insert', 'b'], ['insert', 'b'], ['topk', 1]],)

Expected Output: [['a'], ['b']]

Explanation: After the first two inserts a leads (count 2); after three b inserts, b (count 3) overtakes a, so the second query returns b.

Input: ([['insert', 'c'], ['insert', 'a'], ['insert', 'b'], ['topk', 2]],)

Expected Output: [['a', 'b']]

Explanation: All three items tie at count 1; the deterministic ascending tie-break orders them a, b, c, so the top 2 are a and b.

Hints

Maintain a hash map from item to its running count; an insert is a single increment.
For a topk query you do not need a full global ordering — a min-heap of size k over (count, item) pairs gives the answer in O(D log k).
Decide a deterministic tie-break (here: lower item value wins) so equal-frequency items have a stable order.
For the sliding-window follow-up, bucket counts by time slice and subtract expired buckets; for the distributed follow-up, partition by item hash and merge per-partition top-K results.

Quick Overview