Implement a Streaming VMStat Alert
Company: Meta
Role: Site Reliability Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates streaming data processing skills, sliding-window counting algorithms, and efficient online state management for metric monitoring.
Constraints
- 1 <= len(lines) <= 200000, and lines[0] is the header row if present
- 1 <= window_seconds <= 200000
- 0 <= max_exceed_count <= window_seconds
- The target metric always exists in the header
- Each sample row has the same number of whitespace-delimited fields as the header
- Metric values are numeric
Examples
Input: (['us sy id', '90 5 5', '110 5 5', '120 6 4', '80 10 10'], 'us', 100, 1, 3)
Expected Output: [['110 5 5', '120 6 4']]
Explanation: The second and third samples are abnormal. When the third sample arrives, the 3-sample window contains 2 abnormal rows, which is greater than 1, so one alert is emitted.
Input: (['us sy', '101 1', '102 1', '50 1', '130 1', '140 1'], 'us', 100, 1, 2)
Expected Output: [['101 1', '102 1'], ['130 1', '140 1']]
Explanation: With a 2-sample window, rows 1 and 2 trigger the first alert. The alert clears when the count drops back to 1, then rows 4 and 5 trigger a second alert.
Input: (['us sy id'], 'us', 10, 0, 5)
Expected Output: []
Explanation: There are no sample rows after the header, so no alert can be triggered.
Input: (['us sy', '100 1', '101 1', '100 1'], 'us', 100, 0, 2)
Expected Output: [['101 1']]
Explanation: A value equal to the threshold is not abnormal because the comparison is strictly greater than. Only the middle row is abnormal, so exactly one alert is emitted.
Input: (['us sy id', '150 0 0', '20 0 80', '20 0 80', '160 0 0'], 'us', 100, 1, 2)
Expected Output: []
Explanation: The first abnormal sample falls out of the 2-sample window before the last abnormal sample arrives, so the abnormal count never becomes greater than 1.
Hints
- Because samples arrive once per second, you can treat the row number as the timestamp and use it to expire old samples.
- You do not need to store every sample. Keeping only abnormal samples that are still inside the current window is enough to know when an alert should fire.