PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates the ability to build an inverted index for in-memory exact phrase search and to implement JSON path traversal with wildcard matching, testing competency in data structures, string normalization and tokenization, and recursive or iterative traversal of nested maps.

  • hard
  • Netflix
  • Coding & Algorithms
  • Data Engineer

Implement phrase search and JSON path

Company: Netflix

Role: Data Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Onsite

The coding rounds included two concrete implementation tasks: 1. **In-memory document phrase search** - You are given a list of documents, each represented as a string. - Preprocess the corpus into an inverted index that maps each normalized word to the document IDs and word positions where it appears. - Implement `search(query)`: - If `query` contains one word, return all documents containing that word. - If `query` contains multiple words, return only documents that contain the exact phrase in consecutive positions. - Assume case-insensitive matching and whitespace tokenization. 2. **JSON path query with wildcards** - You are given a nested JSON-like object represented as `Map<String, Object>`, where values may be scalars or nested maps. - Implement `getValue(json, path)` for paths such as `.contacts.cell`, `contacts.*`, or `.*.cell`. - The token `*` matches any key at the current level. - If the path does not exist, return `null`. - If a wildcard matches multiple branches, return all matched results in a list. - Handle edge cases such as leading dots, missing keys, and wildcards at the end of the path.

Quick Answer: This question evaluates the ability to build an inverted index for in-memory exact phrase search and to implement JSON path traversal with wildcard matching, testing competency in data structures, string normalization and tokenization, and recursive or iterative traversal of nested maps.

Part 1: In-Memory Document Phrase Search

You are given a list of documents, where each document is a string, and a search query string. Implement phrase search over the documents. Treat matching as case-insensitive and tokenize text using whitespace splitting only. A one-word query matches every document containing that word. A multi-word query matches only documents where the entire query appears as an exact phrase in consecutive word positions. Return the matching document IDs using 0-based indexing in ascending order. Although the function only answers one query, an efficient solution should build an inverted index that stores, for each normalized word, the document IDs and positions where it appears.

Constraints

  • 0 <= len(documents) <= 10^4
  • The total number of whitespace-separated tokens across all documents is at most 2 * 10^5
  • Matching is case-insensitive
  • Tokenization is done with standard whitespace splitting only
  • An empty or whitespace-only query should return []

Examples

Input: (['The quick brown fox', 'quick brown', 'brown quick'], 'quick brown')

Expected Output: [0, 1]

Explanation: Documents 0 and 1 contain the exact phrase 'quick brown'. Document 2 has the words in reverse order.

Input: (['Alpha beta', 'gamma ALPHA gamma', 'beta'], 'alpha')

Expected Output: [0, 1]

Explanation: Single-word search is case-insensitive, so both 'Alpha' and 'ALPHA' match.

Input: (['a a a', 'a b a', 'a a'], 'a a')

Expected Output: [0, 2]

Explanation: Document 0 contains the phrase more than once, and document 2 contains it once. Document 1 does not have consecutive 'a a'.

Input: ([], 'anything')

Expected Output: []

Explanation: Edge case: an empty corpus has no matches.

Input: (['one two three'], ' ')

Expected Output: []

Explanation: Edge case: a whitespace-only query is treated as empty.

Hints

  1. For phrase queries, storing only document IDs is not enough; you also need word positions.
  2. First narrow down candidate documents by intersecting the document sets for all query words, then verify consecutive positions.

Part 2: JSON Path Query with Wildcards

You are given a nested JSON-like object represented with Python dictionaries, where each value is either a scalar or another dictionary. Implement path lookup with wildcard support. A path is a dot-separated string such as '.contacts.cell', 'contacts.*', or '.*.cell'. Leading dots should be ignored. The token '*' matches any key at the current dictionary level. Rules: - If the path resolves to exactly one value, return that value directly. - If wildcard expansion finds multiple values, return them in a list. - If the path does not exist, return None. - If the path is empty or just dots, return the original object. - When returning multiple values, preserve dictionary iteration order.

Constraints

  • The structure contains only dictionaries and scalar values; arrays are not used
  • Keys are non-empty strings and do not contain '.'
  • 0 <= total number of keys in the structure <= 10^5
  • Leading dots in the path should be ignored
  • If wildcard expansion yields multiple matches, return them in dictionary iteration order

Examples

Input: ({'contacts': {'cell': '111', 'home': '222'}}, '.contacts.cell')

Expected Output: '111'

Explanation: Leading dots are ignored, so the path resolves to the scalar value at contacts -> cell.

Input: ({'contacts': {'cell': '111', 'home': '222'}}, 'contacts.*')

Expected Output: ['111', '222']

Explanation: The wildcard at the end matches both keys inside 'contacts'.

Input: ({'contacts': {'cell': '111', 'home': '222'}, 'work': {'cell': '333'}, 'meta': 5}, '.*.cell')

Expected Output: ['111', '333']

Explanation: The top-level wildcard visits each top-level value. Only dictionary branches with a 'cell' key produce matches.

Input: ({'a': {'b': 1}}, 'a.c')

Expected Output: None

Explanation: The path does not exist, so the function should return None.

Input: ({'a': 1}, '.')

Expected Output: {'a': 1}

Explanation: Edge case: an empty path after removing dots returns the original object.

Hints

  1. Split the path by '.' and ignore empty tokens created by leading dots.
  2. A recursive or DFS-style traversal works well: when you see '*', try every child at the current dictionary level.
Last updated: May 4, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement Cache, Undo, and DFS - Netflix
  • Implement Streaming Word Counter - Netflix (medium)
  • Implement TTL Cache and Tree Balance Reporting - Netflix (medium)
  • Implement ordering and undo executor - Netflix (medium)
  • Implement Caches, Undo, and Traversal - Netflix