Implement tag matcher and filesystem
Company: Harvey
Role: Backend Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This combined prompt evaluates string-processing and parsing skills for tag matching and design/implementation skills for an in-memory file system, measuring competencies in exact whole-word matching, case normalization, punctuation and duplicate handling, path parsing, hierarchical data structures, and file API behaviors.
Part 1: Whole-Word Tag Matching in Text
Constraints
- 0 <= len(sentence) <= 100000
- 0 <= len(tags) <= 10000
- Each tag is non-empty and contains only letters and digits
- Matching is case-insensitive
- Any non-alphanumeric character in the sentence is treated as a separator
Examples
Input: ('The blue car is parked near the red-house.', ['blue', 'red', 'green'])
Expected Output: ['blue', 'red']
Explanation: Hyphen is treated as a separator, so both 'blue' and 'red' are whole-word matches.
Input: ('The blueprint was approved.', ['blue', 'print', 'blueprint'])
Expected Output: ['blueprint']
Explanation: Neither 'blue' nor 'print' appears as a whole word; only 'blueprint' does.
Input: ('Blue, blue; RED!', ['red', 'Blue', 'blue', 'green', 'RED'])
Expected Output: ['red', 'Blue']
Explanation: Matching is case-insensitive, and duplicate tags are returned only once based on their first appearance in the tag list.
Input: ('', ['a', 'b'])
Expected Output: []
Explanation: Edge case: an empty sentence contains no words.
Hints
- Convert the sentence into a set of normalized whole words first, so each lookup is fast.
- Be careful to remove duplicate tags by their lowercase form while still preserving the first original spelling from the input list.
Part 2: Simulate an In-Memory File System
Constraints
- 1 <= len(operations) <= 20000
- len(operations) == len(arguments)
- All paths are absolute and valid
- Path components do not contain '/'
- There are no conflicts where the same path must be both a file and a directory
Examples
Input: (['mkdir', 'addContentToFile', 'ls', 'readContentFromFile'], [('/a/b',), ('/a/b/file.txt', 'hello'), ('/a/b',), ('/a/b/file.txt',)])
Expected Output: [None, None, ['file.txt'], 'hello']
Explanation: Both directory creation and file writing succeed, and the missing parent directories for the file already exist.
Input: (['addContentToFile', 'addContentToFile', 'readContentFromFile', 'ls'], [('/notes.txt', 'hi'), ('/notes.txt', ' there'), ('/notes.txt',), ('/',)])
Expected Output: [None, None, 'hi there', ['notes.txt']]
Explanation: Writing to an existing file appends content instead of overwriting it.
Input: (['mkdir', 'addContentToFile', 'addContentToFile', 'ls', 'ls'], [('/z/y',), ('/z/a.txt', '1'), ('/z/b.txt', '2'), ('/z',), ('/z/a.txt',)])
Expected Output: [None, None, None, ['a.txt', 'b.txt', 'y'], ['a.txt']]
Explanation: Directory listings are sorted, and listing a file path returns a single-element list with the file name.
Input: (['ls'], [('/',)])
Expected Output: [[]]
Explanation: Edge case: listing the empty root directory returns an empty list.
Hints
- A tree structure works well: each directory stores its children, and file nodes store content.
- Split each path by '/' and ignore empty parts so that '/' is handled cleanly.