How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a hard difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Netflix.

What role is this question designed for?

This question is commonly asked for Data Engineer candidates at Netflix during technical interviews.

Implement phrase search and JSON path | Netflix Coding Question

Quick Overview

This question evaluates the ability to build an inverted index for in-memory exact phrase search and to implement JSON path traversal with wildcard matching, testing competency in data structures, string normalization and tokenization, and recursive or iterative traversal of nested maps.

Implement phrase search and JSON path

Company: Netflix

Role: Data Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Onsite

The coding rounds included two concrete implementation tasks: 1. **In-memory document phrase search** - You are given a list of documents, each represented as a string. - Preprocess the corpus into an inverted index that maps each normalized word to the document IDs and word positions where it appears. - Implement `search(query)`: - If `query` contains one word, return all documents containing that word. - If `query` contains multiple words, return only documents that contain the exact phrase in consecutive positions. - Assume case-insensitive matching and whitespace tokenization. 2. **JSON path query with wildcards** - You are given a nested JSON-like object represented as `Map<String, Object>`, where values may be scalars or nested maps. - Implement `getValue(json, path)` for paths such as `.contacts.cell`, `contacts.*`, or `.*.cell`. - The token `*` matches any key at the current level. - If the path does not exist, return `null`. - If a wildcard matches multiple branches, return all matched results in a list. - Handle edge cases such as leading dots, missing keys, and wildcards at the end of the path.

Quick Answer: This question evaluates the ability to build an inverted index for in-memory exact phrase search and to implement JSON path traversal with wildcard matching, testing competency in data structures, string normalization and tokenization, and recursive or iterative traversal of nested maps.

Part 1: In-Memory Document Phrase Search

You are given a list of documents, where each document is a string, and a search query string. Implement phrase search over the documents. Treat matching as case-insensitive and tokenize text using whitespace splitting only. A one-word query matches every document containing that word. A multi-word query matches only documents where the entire query appears as an exact phrase in consecutive word positions. Return the matching document IDs using 0-based indexing in ascending order. Although the function only answers one query, an efficient solution should build an inverted index that stores, for each normalized word, the document IDs and positions where it appears.

Constraints

0 <= len(documents) <= 10^4
The total number of whitespace-separated tokens across all documents is at most 2 * 10^5
Matching is case-insensitive
Tokenization is done with standard whitespace splitting only
An empty or whitespace-only query should return []

Examples

Input: (['The quick brown fox', 'quick brown', 'brown quick'], 'quick brown')

Expected Output: [0, 1]

Explanation: Documents 0 and 1 contain the exact phrase 'quick brown'. Document 2 has the words in reverse order.

Input: (['Alpha beta', 'gamma ALPHA gamma', 'beta'], 'alpha')

Expected Output: [0, 1]

Explanation: Single-word search is case-insensitive, so both 'Alpha' and 'ALPHA' match.

Input: (['a a a', 'a b a', 'a a'], 'a a')

Expected Output: [0, 2]

Explanation: Document 0 contains the phrase more than once, and document 2 contains it once. Document 1 does not have consecutive 'a a'.

Input: ([], 'anything')

Expected Output: []

Explanation: Edge case: an empty corpus has no matches.

Input: (['one two three'], ' ')

Expected Output: []

Explanation: Edge case: a whitespace-only query is treated as empty.

Hints

For phrase queries, storing only document IDs is not enough; you also need word positions.
First narrow down candidate documents by intersecting the document sets for all query words, then verify consecutive positions.

Part 2: JSON Path Query with Wildcards

You are given a nested JSON-like object represented with Python dictionaries, where each value is either a scalar or another dictionary. Implement path lookup with wildcard support. A path is a dot-separated string such as '.contacts.cell', 'contacts.*', or '.*.cell'. Leading dots should be ignored. The token '*' matches any key at the current dictionary level. Rules: - If the path resolves to exactly one value, return that value directly. - If wildcard expansion finds multiple values, return them in a list. - If the path does not exist, return None. - If the path is empty or just dots, return the original object. - When returning multiple values, preserve dictionary iteration order.

Constraints

The structure contains only dictionaries and scalar values; arrays are not used
Keys are non-empty strings and do not contain '.'
0 <= total number of keys in the structure <= 10^5
Leading dots in the path should be ignored
If wildcard expansion yields multiple matches, return them in dictionary iteration order

Examples

Input: ({'contacts': {'cell': '111', 'home': '222'}}, '.contacts.cell')

Expected Output: '111'

Explanation: Leading dots are ignored, so the path resolves to the scalar value at contacts -> cell.

Input: ({'contacts': {'cell': '111', 'home': '222'}}, 'contacts.*')

Expected Output: ['111', '222']

Explanation: The wildcard at the end matches both keys inside 'contacts'.

Input: ({'contacts': {'cell': '111', 'home': '222'}, 'work': {'cell': '333'}, 'meta': 5}, '.*.cell')

Expected Output: ['111', '333']

Explanation: The top-level wildcard visits each top-level value. Only dictionary branches with a 'cell' key produce matches.

Input: ({'a': {'b': 1}}, 'a.c')

Expected Output: None

Explanation: The path does not exist, so the function should return None.

Input: ({'a': 1}, '.')

Expected Output: {'a': 1}

Explanation: Edge case: an empty path after removing dots returns the original object.

Hints

Split the path by '.' and ignore empty tokens created by leading dots.
A recursive or DFS-style traversal works well: when you see '*', try every child at the current dictionary level.

Quick Overview