Find Top Errors in Time Window
Company: Notion
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates the ability to parse JSON logs, apply time-window filtering with inclusive/exclusive boundaries, aggregate error frequencies, and implement sorting with lexicographic tie-breaking.
Constraints
- 0 <= number of records <= 100000
- 1 <= k <= 100000
- All timestamps are valid UTC strings in normalized format YYYY-MM-DDTHH:MM:SSZ
- A record with status < 400 must be ignored, even if it has an error value
- The `error` field may be missing or null; such records must be ignored
Examples
Input: ('[{"timestamp":"2024-06-01T10:00:00Z","status":500,"error":"TIMEOUT"},{"timestamp":"2024-06-01T10:01:00Z","status":401,"error":"AUTH_FAILED"},{"timestamp":"2024-06-01T10:02:00Z","status":503,"error":"TIMEOUT"},{"timestamp":"2024-06-01T10:05:00Z","status":200,"error":"IGNORED"},{"timestamp":"2024-06-01T10:10:00Z","status":500,"error":"DB_ERROR"}]', '2024-06-01T10:00:00Z', '2024-06-01T10:10:00Z', 2)
Expected Output: [['TIMEOUT', 2], ['AUTH_FAILED', 1]]
Explanation: The record at 10:10 is excluded because endTime is exclusive. The 200-status record is ignored. TIMEOUT appears twice and AUTH_FAILED once.
Input: ('[{"timestamp":"2024-06-01T09:00:00Z","status":500,"error":"CACHE_MISS"},{"timestamp":"2024-06-01T09:01:00Z","status":500,"error":"DB_ERROR"},{"timestamp":"2024-06-01T09:02:00Z","status":503,"error":"DB_ERROR"},{"timestamp":"2024-06-01T09:03:00Z","status":404,"error":"CACHE_MISS"},{"timestamp":"2024-06-01T09:04:00Z","status":500,"error":null},{"timestamp":"2024-06-01T09:05:00Z","status":200,"error":"SHOULD_NOT_COUNT"}]', '2024-06-01T09:00:00Z', '2024-06-01T09:06:00Z', 5)
Expected Output: [['CACHE_MISS', 2], ['DB_ERROR', 2]]
Explanation: Both errors appear twice. The tie is broken lexicographically, so CACHE_MISS comes before DB_ERROR. Null errors and successful responses are ignored.
Input: ('[]', '2024-06-01T00:00:00Z', '2024-06-02T00:00:00Z', 3)
Expected Output: []
Explanation: There are no records, so there are no errors to return.
Input: ('[{"timestamp":"2024-06-01T00:00:00Z","status":500,"error":"A"},{"timestamp":"2024-06-01T00:30:00Z","status":500,"error":"B"},{"timestamp":"2024-06-01T01:00:00Z","status":500,"error":"C"}]', '2024-06-01T00:30:00Z', '2024-06-01T01:00:00Z', 10)
Expected Output: [['B', 1]]
Explanation: A is before the window, B is inside the window, and C is excluded because endTime is exclusive.
Input: ('[{"timestamp":"2024-06-01T12:00:00Z","status":500},{"timestamp":"2024-06-01T12:01:00Z","status":500,"error":"X"},{"timestamp":"2024-06-01T12:01:30Z","status":399,"error":"Y"}]', '2024-06-01T12:00:00Z', '2024-06-01T12:02:00Z', 2)
Expected Output: [['X', 1]]
Explanation: The first record has no error field, so it is ignored. The third record has status 399, so it is not counted as an error.
Hints
- Use a hash map / dictionary to count how many times each error appears after filtering records.
- Because every timestamp is in the same normalized UTC format, you can check the time window using simple string comparison.