Design a logger that returns top-K messages
Company: Google
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates skills in designing in-memory data structures and algorithms for frequency counting and interval querying, including extracting top-K elements with a heap, within the Coding & Algorithms domain.
Constraints
- 0 <= len(logs) <= 20000
- 0 <= len(queries) <= 10000
- Logs are given in non-decreasing order of timestamp
- There are at most 2000 distinct messages across all logs
- For each query, 1 <= k, and if the interval contains fewer than k distinct messages, return all of them
Examples
Input: ([(1, 'A'), (2, 'B'), (3, 'A'), (10, 'C')], [(1, 3, 2), (1, 10, 3)])
Expected Output: [['A', 'B'], ['A', 'B', 'C']]
Explanation: In [1, 3], A appears 2 times and B appears 1 time. In [1, 10], A appears 2 times, while B and C appear once each, so B comes before C by lexicographic order.
Input: ([(1, 'dog'), (2, 'cat'), (2, 'dog'), (4, 'cat'), (5, 'ant')], [(1, 4, 2)])
Expected Output: [['cat', 'dog']]
Explanation: Between timestamps 1 and 4 inclusive, cat appears 2 times and dog appears 2 times. The tie is broken lexicographically, so cat comes first.
Input: ([(1, 'x'), (5, 'y')], [(2, 4, 3), (1, 5, 5)])
Expected Output: [[], ['x', 'y']]
Explanation: The first query range contains no logs. In the second query, x and y both appear once, and k is larger than the number of distinct messages.
Input: ([], [(1, 10, 2)])
Expected Output: [[]]
Explanation: There are no log events, so every query returns an empty list.
Input: ([(1, 'm'), (1, 'm'), (2, 'n'), (3, 'm'), (3, 'n')], [(1, 1, 2), (3, 3, 1), (1, 3, 2)])
Expected Output: [['m'], ['m'], ['m', 'n']]
Explanation: This checks duplicate timestamps and inclusive boundaries. At time 1, only m appears. At time 3, m and n both appear once, so m wins the tie. Across [1, 3], m appears 3 times and n appears 2 times.
Input: ([(1, 'b'), (2, 'a'), (3, 'b'), (4, 'a'), (5, 'c'), (6, 'a')], [(1, 6, 2), (2, 5, 3)])
Expected Output: [['a', 'b'], ['a', 'b', 'c']]
Explanation: Across all logs, a appears 3 times, b 2 times, and c 1 time. In [2, 5], a appears 2 times while b and c appear once each, so b comes before c lexicographically.
Hints
- Store all timestamps for each message in a hash map. Then use binary search to count how many times that message appears inside a query range.
- After computing the counts for a query, put `(-count, message)` into a heap so the most frequent message comes out first, and lexicographically smaller messages win ties.