Implement Document Predicate APIs
Company: Snowflake
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates the ability to design a simple document storage and boolean keyword predicate evaluation, testing parsing of logical expressions with operator precedence and the selection of data structures for efficient term lookup.
Part 1: Implement InsertDoc and CheckContains for Boolean Document Predicates
Constraints
- 1 <= len(operations) <= 10^4
- Filenames are non-empty strings.
- Document content may be empty; otherwise it is whitespace-separated words.
- Predicates are valid and contain words, '&&', '||', parentheses, and optional spaces.
- Word matching is exact on whitespace-separated tokens.
Examples
Input: ([('insert', 'doc1', 'a b'), ('check', 'doc1', 'a && b || c'), ('check', 'doc1', 'a && (b || c)'), ('check', 'doc1', 'c || d')],)
Expected Output: [True, True, False]
Explanation: '&&' binds tighter than '||', and parentheses must also be respected.
Input: ([('insert', 'doc1', 'a'), ('check', 'doc1', 'a && b'), ('insert', 'doc1', 'b c'), ('check', 'doc1', 'a || b'), ('check', 'missing', 'a')],)
Expected Output: [False, True, False]
Explanation: Insert overwrites the previous content. A missing file behaves like an empty document.
Input: ([('insert', 'doc2', ''), ('check', 'doc2', 'a||b'), ('insert', 'doc2', 'error'), ('check', 'doc2', '(a&&b)||error')],)
Expected Output: [False, True]
Explanation: This includes an empty-content edge case and a predicate without spaces.
Input: ([('insert', 'doc3', 'alpha beta gamma'), ('check', 'doc3', 'beta'), ('check', 'doc3', 'delta || gamma && alpha')],)
Expected Output: [True, True]
Explanation: A single term should work, and precedence makes 'gamma && alpha' evaluate before the final '||'.
Hints
- Store each document as a set of words so checking a single term is O(1).
- A recursive-descent parser is a clean way to enforce '&&' precedence over '||' and handle parentheses.
Part 2: Implement GetAllFiles(predicate) to Return Matching Filenames
Constraints
- 1 <= len(operations) <= 10^4
- Filenames are strings and inserts overwrite older content for the same filename.
- Document content may be empty; otherwise it is whitespace-separated words.
- Predicates are valid and contain words, '&&', '||', parentheses, and optional spaces.
- Returned filenames for each query must be in ascending lexicographic order.
Examples
Input: ([('insert', 'a.txt', 'red blue'), ('insert', 'b.txt', 'blue green'), ('insert', 'c.txt', 'green'), ('get', 'red || green'), ('get', 'red && blue')],)
Expected Output: [['a.txt', 'b.txt', 'c.txt'], ['a.txt']]
Explanation: The first query matches any file containing red or green. The second matches only files containing both red and blue.
Input: ([('insert', 'x', 'a'), ('insert', 'y', 'b'), ('get', 'a || b'), ('insert', 'x', 'c'), ('get', 'a || b'), ('get', 'c && a')],)
Expected Output: [['x', 'y'], ['y'], []]
Explanation: Overwriting x removes it from the postings list for 'a' and adds it to the postings list for 'c'.
Input: ([('get', 'a'), ('insert', 'z', ''), ('get', 'a || b')],)
Expected Output: [[], []]
Explanation: An empty store and an empty document are both valid edge cases.
Input: ([('insert', 'd1', 'error warning'), ('insert', 'd2', 'warning'), ('insert', 'd3', 'error'), ('get', '(error&&warning)||missing'), ('get', 'error||warning')],)
Expected Output: [['d1'], ['d1', 'd2', 'd3']]
Explanation: Parentheses and predicates without spaces should still parse correctly.
Hints
- An inverted index from word -> set of filenames lets a single term evaluate directly to matching documents.
- Once terms become sets, '&&' is set intersection and '||' is set union.
Part 3: Simulate a Sharding and Replication Strategy for the Document Service
Constraints
- 1 <= len(nodes) <= 10^4
- 1 <= num_shards <= 10^9
- 1 <= replication_factor
- Filenames and node names are strings.
- If replication_factor > len(nodes), use each node at most once.
Examples
Input: (['n1', 'n2', 'n3'], 4, 2, ['a', 'b', 'c'], ['n2'], ['a', 'b', 'c'])
Expected Output: {'placements': {'a': {'shard': 1, 'replicas': ['n2', 'n3']}, 'b': {'shard': 2, 'replicas': ['n3', 'n1']}, 'c': {'shard': 3, 'replicas': ['n1', 'n2']}}, 'routes': {'a': 'n3', 'b': 'n3', 'c': 'n1'}}
Explanation: File 'a' falls on shard 1 whose primary is n2, but n2 is failed so reads fall back to n3.
Input: (['x', 'y'], 3, 5, ['doc'], ['x', 'y'], ['doc', 'missing'])
Expected Output: {'placements': {'doc': {'shard': 1, 'replicas': ['y', 'x']}}, 'routes': {'doc': None, 'missing': None}}
Explanation: Replication factor exceeds node count, so only two distinct replicas are used. Both are down.
Input: (['solo'], 10, 3, ['aa', 'bb'], [], ['aa', 'bb'])
Expected Output: {'placements': {'aa': {'shard': 4, 'replicas': ['solo']}, 'bb': {'shard': 6, 'replicas': ['solo']}}, 'routes': {'aa': 'solo', 'bb': 'solo'}}
Explanation: With one node, every file maps to that node regardless of requested replication factor.
Input: (['nA', 'nB', 'nC', 'nD'], 5, 3, ['d'], ['nA'], ['d'])
Expected Output: {'placements': {'d': {'shard': 0, 'replicas': ['nA', 'nB', 'nC']}}, 'routes': {'d': 'nB'}}
Explanation: The primary replica is down, so the read is routed to the next healthy replica.
Hints
- First compute the logical shard for a file, then map that shard to a primary node and walk forward circularly for replicas.
- Cap the replica count at the number of nodes so you never place duplicate replicas on the same node.