Implement a Searchable Logger Pipeline
Company: Rippling
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
Quick Answer: This question evaluates proficiency in software design, data structures, and algorithmic text search by requiring a transformation pipeline, ordered in-memory log storage with unique increasing identifiers, and efficient keyword lookup.
Part 1: In-Memory Logger Pipeline with Scan-Based Search
Constraints
- 0 <= len(operations) <= 10000
- Each operation is valid and follows one of the formats described above.
- 0 <= len(message), len(text), len(keyword) <= 1000
- Total length of all added raw log messages is at most 1000000.
- Search is case-sensitive and matches exact whitespace-separated words after all handlers have transformed the log.
- Log ids start at 1 and increase by 1 for each added log.
Examples
Input: ([['add_handler', 'upper'], ['add_handler', 'suffix', ' DONE'], ['add_log', 'hello world'], ['add_log', 'mixed Case'], ['get_logs'], ['search', 'HELLO'], ['search', 'hello'], ['search', 'DONE']],)
Expected Output: [['1:HELLO WORLD DONE', '2:MIXED CASE DONE'], ['1:HELLO WORLD DONE'], [], ['1:HELLO WORLD DONE', '2:MIXED CASE DONE']]
Explanation: Messages are uppercased and then suffixed. Search is case-sensitive, so 'hello' does not match 'HELLO'.
Input: ([['add_handler', 'prefix', 'pre '], ['add_handler', 'suffix', ' post'], ['add_log', 'core'], ['search', 'pre'], ['search', 'post'], ['get_logs']],)
Expected Output: [['1:pre core post'], ['1:pre core post'], ['1:pre core post']]
Explanation: The prefix handler runs before the suffix handler, producing 'pre core post'.
Input: ([],)
Expected Output: []
Explanation: There are no operations and therefore no query outputs.
Input: ([['add_log', ''], ['add_log', 'repeat repeat'], ['search', 'repeat'], ['search', ''], ['get_logs']],)
Expected Output: [['2:repeat repeat'], [], ['1:', '2:repeat repeat']]
Explanation: The empty log is stored with id 1 but has no searchable words. Empty keyword searches return an empty list.
Input: ([['search', 'x'], ['add_log', 'alpha beta'], ['search', 'gamma'], ['search', 'alpha']],)
Expected Output: [[], [], ['1:alpha beta']]
Explanation: Searching before any logs or for a missing keyword returns an empty list.
Hints
- Store handlers as small descriptors, then replay them in order whenever a new log is added.
- For the scan-based search, it is acceptable to inspect every stored log and split its message into words.
Part 2: In-Memory Logger Pipeline with Inverted Index Search
Constraints
- 0 <= len(operations) <= 10000
- Each operation is valid and follows one of the formats described above.
- 0 <= len(message), len(text), len(keyword) <= 1000
- Total length of all added raw log messages is at most 1000000.
- Search is case-sensitive and matches exact whitespace-separated words after all handlers have transformed the log.
- Repeated words in the same transformed log must not create duplicate search results.
Examples
Input: ([['add_handler', 'upper'], ['add_log', 'hello hello world'], ['add_log', 'world'], ['search', 'HELLO'], ['search', 'WORLD'], ['search', 'hello'], ['get_logs']],)
Expected Output: [['1:HELLO HELLO WORLD'], ['1:HELLO HELLO WORLD', '2:WORLD'], [], ['1:HELLO HELLO WORLD', '2:WORLD']]
Explanation: The first log contains HELLO twice, but it appears once in the search result. Search is case-sensitive.
Input: ([['add_log', ''], ['search', 'anything'], ['search', ''], ['get_logs']],)
Expected Output: [[], [], ['1:']]
Explanation: An empty log is stored but contributes no words to the index. Empty keyword searches return an empty list.
Input: ([['add_handler', 'prefix', 'INFO '], ['add_handler', 'suffix', ' OK'], ['add_log', 'server started'], ['add_log', 'server stopped'], ['search', 'INFO'], ['search', 'server'], ['search', 'OK']],)
Expected Output: [['1:INFO server started OK', '2:INFO server stopped OK'], ['1:INFO server started OK', '2:INFO server stopped OK'], ['1:INFO server started OK', '2:INFO server stopped OK']]
Explanation: Prefix and suffix handlers add searchable words to every future log.
Input: ([['add_log', 'alpha beta'], ['search', 'beta'], ['add_log', 'beta gamma'], ['search', 'beta'], ['search', 'gamma']],)
Expected Output: [['1:alpha beta'], ['1:alpha beta', '2:beta gamma'], ['2:beta gamma']]
Explanation: The index is updated incrementally as new logs are added.
Input: ([],)
Expected Output: []
Explanation: No operations produce no query outputs.
Hints
- When adding a log, split the transformed message into words and update a dictionary from word to log ids.
- Use a set of words from a single log before updating the index to avoid indexing the same log id multiple times for repeated words.