Design a core component in a streaming system:
Input:
-
Multiple upstream services continuously emit log events.
-
Each event includes at least:
service_id
,
timestamp
,
log_level
,
message
.
Tasks:
-
Filter and output only
error logs
.
-
Maintain
real-time per-service error count
.
-
Maintain a
moving average
of error count per service over a sliding time window.
-
Trigger an
alarm
when a service’s error rate/moving average crosses a threshold.
Describe the architecture, state management, windowing approach, and how you handle late events, scale, and fault tolerance.