How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a medium difficulty System Design question, commonly asked during Onsite rounds at LinkedIn.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at LinkedIn during technical interviews.

Design an exception monitoring system with top‑K

Q: Design an exception monitoring system with top‑K

This question evaluates the ability to design a scalable, low-latency exception monitoring system focusing on streaming ingestion, event grouping/fingerprinting, data modeling for raw and aggregated stores, retention policies, and payload privacy.

System Design: Exception Monitoring with Top-K

Design an exception monitoring system for a microservices environment.

Core requirements

Services emit exception events (message, stack trace, service name, environment, version, timestamp, severity, request context).
The system should enable on-call engineers to:
- View Top K exceptions over a time window (e.g., last 5/15/60 minutes), grouped/deduplicated by “same exception.”
- Filter by service, environment (prod/staging), deployment version, region.
- Drill down into a group to see recent samples and aggregated stats.

Non-functional requirements

High write throughput, low-latency queries for Top K.
Handle duplicates, retries, bursts (incident storms).
Retain raw data for debugging (e.g., 7–30 days) and aggregated metrics longer.
Protect sensitive data in payloads.

Clarifications to address

How exceptions are collected from services.
How events are grouped (fingerprinting) and how you store/query efficiently.
What the database schema / key columns look like for both raw events and aggregates.

Deliverables: high-level architecture, data flow, storage choices, and APIs used by UI/on-call tooling.

System Design: Exception Monitoring with Top-K

Design an exception monitoring system for a microservices environment.

Core requirements

Services emit exception events (message, stack trace, service name, environment, version, timestamp, severity, request context).

The system should enable on-call engineers to:

View Top K exceptions over a time window (e.g., last 5/15/60 minutes), grouped/deduplicated by “same exception.”
Filter by service, environment (prod/staging), deployment version, region.
Drill down into a group to see recent samples and aggregated stats.

Non-functional requirements

High write throughput, low-latency queries for Top K.

Handle duplicates, retries, bursts (incident storms).

Retain raw data for debugging (e.g., 7–30 days) and aggregated metrics longer.

Protect sensitive data in payloads.

Clarifications to address

How exceptions are collected from services.

How events are grouped (fingerprinting) and how you store/query efficiently.

What the database schema / key columns look like for both raw events and aggregates.

Deliverables: high-level architecture, data flow, storage choices, and APIs used by UI/on-call tooling.

Design an exception monitoring system with top‑K

Quick Overview

Design an exception monitoring system with top‑K

System Design: Exception Monitoring with Top-K

Core requirements

Non-functional requirements

Clarifications to address

Submit Your Answer to Earn 20XP

Design an exception monitoring system with top‑K

Quick Overview

Design an exception monitoring system with top‑K

System Design: Exception Monitoring with Top-K

Core requirements

Non-functional requirements

Clarifications to address

Submit Your Answer to Earn 20XP