This question evaluates a candidate's ability to design scalable, fault-tolerant real-time aggregation systems that produce per-minute status counts, including considerations for event-time versus processing-time semantics, out-of-order and late arrivals, deduplication, and missing heartbeats.
You have 1,000,000 devices. Every minute, each device sends a message containing:
device_id
timestamp
status
(one of
10 possible statuses
)
Design a system that can compute in real time the number of devices in each status for the previous minute (i.e., for each completed minute window, output 10 counts).
In your design, state assumptions and handle:
Assume a dashboard or API needs to read the counts for each minute (and optionally recent history like last 1–24 hours).