System Design: Real-Time Top‑K Purchased Items Over Rolling Windows
Design a real-time service that ingests purchase events and continuously outputs the top‑K most purchased items over configurable rolling windows (for example: last 5 minutes and last 24 hours).
Input Event
Each purchase event contains:
-
customer_id
-
item_id
-
timestamp (event time)
Assume K and window sizes are configurable at runtime.
Requirements
Specify and justify:
-
Data model for events and results.
-
Ingestion pipeline and partitioning.
-
Aggregation/windowing strategy (e.g., event-time keyed windows, panes).
-
Algorithms/data structures to maintain top‑K at scale (e.g., heaps, Space‑Saving, Count–Min Sketch + heap), including complexity and memory trade‑offs.
-
Handling late, duplicate, and out‑of‑order events (watermarks, allowed lateness, deduplication).
-
State management, TTL/retention, and backpressure control.
-
Fault tolerance and exactly‑once processing semantics end‑to‑end.
-
Horizontal scalability and hot‑key mitigation.
-
APIs for querying the current top‑K for given windows (and update cadence/latency).
Make minimal, explicit assumptions as needed (e.g., target throughput, latency SLOs), and propose a design that works for both a short window (5m) and a long window (24h).