Design an exception monitoring system with top‑K

Q: Design an exception monitoring system with top‑K

This is a System Design interview question from LinkedIn for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Loading...

System Design: Exception Monitoring with Top-K

Design an exception monitoring system for a microservices environment.

Core requirements

Services emit exception events (message, stack trace, service name, environment, version, timestamp, severity, request context).
The system should enable on-call engineers to:
- View Top K exceptions over a time window (e.g., last 5/15/60 minutes), grouped/deduplicated by “same exception.”
- Filter by service, environment (prod/staging), deployment version, region.
- Drill down into a group to see recent samples and aggregated stats.

Non-functional requirements

High write throughput, low-latency queries for Top K.
Handle duplicates, retries, bursts (incident storms).
Retain raw data for debugging (e.g., 7–30 days) and aggregated metrics longer.
Protect sensitive data in payloads.

Clarifications to address

How exceptions are collected from services.
How events are grouped (fingerprinting) and how you store/query efficiently.
What the database schema / key columns look like for both raw events and aggregates.

Deliverables: high-level architecture, data flow, storage choices, and APIs used by UI/on-call tooling.

Design an exception monitoring system with top‑K

System Design: Exception Monitoring with Top-K

Core requirements

Non-functional requirements

Clarifications to address

Solution

Comments (0)