Design log queries and a buffered writer
Company: Datadog
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of time-indexed data structures and timestamp granularity handling for range queries, along with buffered I/O mechanics including partial-write semantics, error handling, ordering guarantees, and concurrency control.
Part 1: Log Store with Time-Range Queries
Constraints
- 1 <= len(operations) <= 10000
- Each timestamp is a valid zero-padded string in format YYYY-MM-DD HH:MM:SS
- Log IDs are unique positive integers represented as strings in add operations
- start <= end when compared at full timestamp precision
- granularity is one of: Year, Month, Day, Hour, Minute, Second
Examples
Input: ([['add', '1', '2017-01-01 23:59:59'], ['add', '2', '2017-01-02 00:00:00'], ['add', '3', '2018-03-01 12:00:00'], ['query', '2017-01-01 00:00:00', '2017-12-31 23:59:59', 'Year'], ['query', '2017-01-01 23:00:00', '2017-01-01 23:59:00', 'Hour']],)
Expected Output: [[1, 2], [1]]
Explanation: The Year query includes both 2017 logs. The Hour query includes only the log in hour 2017-01-01 23.
Input: ([['add', '10', '2020-05-20 10:20:30'], ['add', '11', '2020-05-01 00:00:00'], ['add', '12', '2020-05-20 10:20:30'], ['query', '2020-05-01 00:00:00', '2020-05-31 23:59:59', 'Month'], ['query', '2020-05-20 00:00:00', '2020-05-20 23:59:59', 'Day']],)
Expected Output: [[11, 10, 12], [10, 12]]
Explanation: Results are chronological. Logs 10 and 12 have the same timestamp, so their add order is preserved.
Input: ([['query', '2019-01-01 00:00:00', '2019-12-31 23:59:59', 'Year']],)
Expected Output: [[]]
Explanation: Edge case: querying an empty log store returns an empty result.
Input: ([['add', '5', '2021-12-31 23:59:59'], ['query', '2021-12-31 23:59:59', '2021-12-31 23:59:59', 'Second'], ['query', '2022-01-01 00:00:00', '2022-01-01 00:00:00', 'Minute']],)
Expected Output: [[5], []]
Explanation: The exact-second query finds log 5. The later minute query does not.
Input: ([['query', '2019-01-01 00:00:00', '2019-12-31 23:59:59', 'Year'], ['add', '7', '2019-06-15 08:30:00'], ['query', '2019-06-01 00:00:00', '2019-06-30 23:59:59', 'Month'], ['add', '8', '2019-06-01 00:00:00'], ['query', '2019-06-01 00:00:00', '2019-06-30 23:59:59', 'Month']],)
Expected Output: [[], [7], [8, 7]]
Explanation: Queries reflect only logs added before that query. The final query returns June logs in chronological order.
Hints
- Because the timestamp format is fixed-width and zero-padded, lexicographic ordering is the same as chronological ordering.
- For each granularity, convert the query into a full lower-bound timestamp and a full upper-bound timestamp, then binary search in a sorted list.
Part 2: Buffered Writer with Partial Sink Writes
Constraints
- 1 <= buffer_size <= 100000
- 1 <= max_accept <= 100000
- 1 <= len(operations) <= 10000
- The total length of all write data strings is <= 200000
- Write data contains only lowercase English letters, so it cannot conflict with the special 'ERROR' event
Examples
Input: (4, 10, [['write', 'ab'], ['write', 'cd'], ['write', 'e'], ['flush']])
Expected Output: ['abcd', 'e']
Explanation: The first two writes fill the 4-character buffer, causing one sink call. The final partial buffer is written by flush.
Input: (5, 2, [['write', 'abcdef'], ['close']])
Expected Output: ['ab', 'cd', 'e', 'f']
Explanation: The full buffer chunk 'abcde' is drained through partial sink writes of size at most 2. Then close drains the remaining 'f'.
Input: (3, 2, [['write', 'a'], ['write', 'bcdef'], ['flush']])
Expected Output: ['ab', 'c', 'de', 'f']
Explanation: The bytes are chunked into full internal buffers 'abc' and 'def'. Each is partially accepted by the sink.
Input: (4, 2, [['flush'], ['write', ''], ['close']])
Expected Output: []
Explanation: Edge case: flushing or closing an empty buffer causes no sink calls, and an empty write has no effect.
Input: (3, 2, [['write', 'ab'], ['close'], ['write', 'c'], ['flush'], ['close']])
Expected Output: ['ab', 'ERROR', 'ERROR', 'ERROR']
Explanation: close drains the pending bytes and closes the writer. All later operations are invalid.
Hints
- Separate two ideas: filling the internal buffer, and draining a buffer chunk through a sink that may accept only a prefix.
- When draining, keep calling the sink on the remaining suffix until every byte from that buffer chunk has been accepted.