Design real-time top-k items

Q: Design real-time top-k items

This question evaluates understanding of real-time streaming system architecture and stateful analytics, covering event-time processing and late/out-of-order handling, ingestion and partitioning strategies, state management and checkpointing, fault tolerance, and algorithms for top-k aggregation.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Design: Real-Time Top-k Most Purchased Items

You are given an unbounded stream of purchase events with schema:

purchase_event: (customer_id, item_id, timestamp)

Design a system to compute and serve the top-k most purchased items in real time, supporting:

Sliding time window T (e.g., last 1 hour).
Daily rolling totals (current day so far and finalized previous days).

Discuss and justify choices for:

Ingestion and partitioning strategy.
Processing semantics (exactly-once vs at-least-once).
Handling late and out-of-order events (watermarks, allowed lateness, retractions).
Data structures/algorithms for efficient updates (e.g., count map + min-heap, heavy-hitter sketches).
Storage (raw events, state, serving layer) and schemas.
API/query layer and caching.
Scalability and fault tolerance (state checkpointing, recovery, hot-key mitigation).
Latency vs accuracy trade-offs and tunables.

Design real-time top-k items

Design: Real-Time Top-k Most Purchased Items

Solution

Comments (0)

Design real-time top-k items

Overview

Design: Real-Time Top-k Most Purchased Items

Solution

Comments (0)