Implement phrase search and JSON path
Company: Netflix
Role: Data Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Onsite
The coding rounds included two concrete implementation tasks:
1. **In-memory document phrase search**
- You are given a list of documents, each represented as a string.
- Preprocess the corpus into an inverted index that maps each normalized word to the document IDs and word positions where it appears.
- Implement `search(query)`:
- If `query` contains one word, return all documents containing that word.
- If `query` contains multiple words, return only documents that contain the exact phrase in consecutive positions.
- Assume case-insensitive matching and whitespace tokenization.
2. **JSON path query with wildcards**
- You are given a nested JSON-like object represented as `Map<String, Object>`, where values may be scalars or nested maps.
- Implement `getValue(json, path)` for paths such as `.contacts.cell`, `contacts.*`, or `.*.cell`.
- The token `*` matches any key at the current level.
- If the path does not exist, return `null`.
- If a wildcard matches multiple branches, return all matched results in a list.
- Handle edge cases such as leading dots, missing keys, and wildcards at the end of the path.
Quick Answer: This question evaluates the ability to build an inverted index for in-memory exact phrase search and to implement JSON path traversal with wildcard matching, testing competency in data structures, string normalization and tokenization, and recursive or iterative traversal of nested maps.