Implement an Alert Execution Engine
Company: Palo
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates the ability to implement a stateful, concurrent alert execution engine, focusing on skills such as periodic task scheduling, concurrency control, external API interaction, and state-transition detection for threshold-based notifications.
Constraints
- 0 <= n <= 10^4
- names, queries, critical_thresholds, interval_seconds, and threshold_messages all have length n
- Alert names are unique
- 1 <= interval_seconds[i] <= 10^9
- -10^9 <= critical_thresholds[i] <= 10^9
- 0 <= total_seconds <= 10^9
- Every query used by an alert appears in query_values and maps to a non-empty list
- All metric values are integers between -10^9 and 10^9
- The total number of alert evaluations is at most 200000
- Alert names and threshold messages do not contain ':'
Examples
Input: (['cpu_high'], ['cpu'], [80], [10], ['CPU high'], {'cpu': [50, 85, 90, 70, 95]}, 40)
Expected Output: ['10:NOTIFY:cpu_high:CPU high', '20:NOTIFY:cpu_high:CPU high', '30:RESOLVE:cpu_high', '40:NOTIFY:cpu_high:CPU high']
Explanation: The alert is evaluated at 0, 10, 20, 30, and 40 seconds. Values 85, 90, and 95 are above 80, so notifications are sent. At time 30 the value drops to 70 after being critical, so the alert resolves.
Input: (['errors', 'latency'], ['err', 'lat'], [5, 200], [15, 10], ['Too many errors', 'Latency high'], {'err': [1, 7, 3], 'lat': [250, 180, 220, 210]}, 30)
Expected Output: ['0:NOTIFY:latency:Latency high', '10:RESOLVE:latency', '15:NOTIFY:errors:Too many errors', '20:NOTIFY:latency:Latency high', '30:RESOLVE:errors', '30:NOTIFY:latency:Latency high']
Explanation: The two alerts run on different intervals. At time 30 both are due, so errors is evaluated first because it has the smaller input index.
Input: ([], [], [], [], [], {}, 100)
Expected Output: []
Explanation: There are no alerts to schedule or evaluate.
Input: (['disk'], ['disk_used'], [10], [5], ['Disk above 10'], {'disk_used': [10, 11, 10]}, 10)
Expected Output: ['5:NOTIFY:disk:Disk above 10', '10:RESOLVE:disk']
Explanation: A value equal to the threshold is PASS. The alert fires at time 5 when the value is 11, then resolves at time 10 when the value returns to 10.
Input: (['a', 'b'], ['shared', 'shared'], [5, 5], [10, 10], ['A high', 'B high'], {'shared': [6, 4, 7, 3]}, 10)
Expected Output: ['0:NOTIFY:a:A high', '10:NOTIFY:a:A high']
Explanation: Both alerts use the same query, so they share one round-robin cursor. Alert a consumes 6 and 7, both critical; alert b consumes 4 and 3, both passing.
Hints
- Treat each alert as an independent scheduled task. A priority queue keyed by next evaluation time can avoid scanning every second.
- Keep two kinds of state: the previous PASS/CRITICAL state per alert, and the round-robin cursor per query string.