PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's proficiency in stream-processing algorithms and scalable data structures for frequency estimation and top-K queries, emphasizing analysis of time/space trade-offs, sliding-window semantics, and distributed aggregation.

  • Medium
  • Coinbase
  • Coding & Algorithms
  • Software Engineer

Implement top-K over a stream

Company: Coinbase

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Take-home Project

Given a high-volume stream of events (e.g., account IDs from new account openings), design and implement a data structure that supports: ( 1) inserting an item, ( 2) returning the current top-K most frequent items, and ( 3) optionally returning the top-K over the last T minutes (sliding window). Discuss algorithm choices (heaps, bucket counting, count–min sketch with heap), time/space complexities, handling ties, changing K at query time, and window expiration. Outline a scalable distributed approach, including partitioning, partial aggregation, and merge semantics.

Quick Answer: This question evaluates a candidate's proficiency in stream-processing algorithms and scalable data structures for frequency estimation and top-K queries, emphasizing analysis of time/space trade-offs, sliding-window semantics, and distributed aggregation.

Design a data structure for a high-volume event stream (e.g. account IDs from new account openings) that supports two operations: 1. **insert(item)** — record one occurrence of `item`. 2. **topk(k)** — return the `k` most frequent items seen so far, ordered from most frequent to least frequent. Ties (items with equal frequency) are broken by item value in ascending (lexicographic) order so the result is deterministic. If fewer than `k` distinct items exist, return all of them. Implement the function `topKFrequentStream(operations)`. `operations` is a list of operations applied in order. Each operation is one of: - `["insert", item]` — insert `item` (a string). - `["topk", k]` — query for the current top `k` items. Return a list containing the result (a list of items) of every `topk` query, in the order the queries occurred. **Example** Input: `[["insert","x"],["insert","y"],["insert","x"],["insert","z"],["insert","y"],["insert","x"],["topk",2]]` x appears 3 times, y twice, z once, so the top 2 are `["x","y"]`. Output: `[["x","y"]]`. **Discussion (for the interview, not graded here):** A `Counter` (hash map) plus a partial sort gives O(N) inserts and O(D log D) per query (D = distinct items); a min-heap of size k gives O(D log k) per query and naturally supports changing k. For the sliding-window variant, bucket counts by time and expire old buckets, or maintain per-window counters. At scale, partition the stream by item hash, keep per-partition Count-Min sketch + heap of local top-K, then merge partial top-K results (merging is approximate unless you ship full counts).

Constraints

  • 1 <= number of operations <= 10^5
  • Each insert item is a non-empty string.
  • 0 <= k for a topk query; if k exceeds the number of distinct items, return all of them.
  • Frequency ties are broken by item value in ascending (lexicographic) order.

Examples

Input: ([['insert', 'a'], ['insert', 'b'], ['insert', 'a'], ['topk', 1]],)

Expected Output: [['a']]

Explanation: a has count 2, b has count 1; the single most frequent item is a.

Input: ([['insert', 'x'], ['insert', 'y'], ['insert', 'x'], ['insert', 'z'], ['insert', 'y'], ['insert', 'x'], ['topk', 2]],)

Expected Output: [['x', 'y']]

Explanation: Counts: x=3, y=2, z=1. Top 2 are x then y.

Input: ([['insert', 'a'], ['insert', 'b'], ['topk', 5]],)

Expected Output: [['a', 'b']]

Explanation: Only 2 distinct items exist, fewer than k=5, so both are returned; a and b tie at count 1 and are ordered ascending.

Input: ([['topk', 3]],)

Expected Output: [[]]

Explanation: No items have been inserted, so the top-K of an empty stream is an empty list.

Input: ([['insert', 'a'], ['insert', 'a'], ['topk', 1], ['insert', 'b'], ['insert', 'b'], ['insert', 'b'], ['topk', 1]],)

Expected Output: [['a'], ['b']]

Explanation: After the first two inserts a leads (count 2); after three b inserts, b (count 3) overtakes a, so the second query returns b.

Input: ([['insert', 'c'], ['insert', 'a'], ['insert', 'b'], ['topk', 2]],)

Expected Output: [['a', 'b']]

Explanation: All three items tie at count 1; the deterministic ascending tie-break orders them a, b, c, so the top 2 are a and b.

Hints

  1. Maintain a hash map from item to its running count; an insert is a single increment.
  2. For a topk query you do not need a full global ordering — a min-heap of size k over (count, item) pairs gives the answer in O(D log k).
  3. Decide a deterministic tie-break (here: lower item value wins) so equal-frequency items have a stable order.
  4. For the sliding-window follow-up, bucket counts by time slice and subtract expired buckets; for the distributed follow-up, partition by item hash and merge per-partition top-K results.
Last updated: Jun 26, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement a Coin-Constrained Jump Strategy - Coinbase (hard)
  • Implement an In-Memory Database - Coinbase (hard)
  • Implement Game Physics and Block Mining - Coinbase (hard)
  • Compute Total Manual Distance - Coinbase (medium)
  • Implement a Flappy Bird Jump Agent - Coinbase