PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCareers

Quick Overview

This question evaluates the ability to design a simple document storage and boolean keyword predicate evaluation, testing parsing of logical expressions with operator precedence and the selection of data structures for efficient term lookup.

  • medium
  • Snowflake
  • Coding & Algorithms
  • Software Engineer

Implement Document Predicate APIs

Company: Snowflake

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Design a simple document service that stores text documents identified by filename. Implement these APIs: - `InsertDoc(filename, content)`: store or overwrite a document. - `CheckContains(filename, predicate) -> bool`: return whether the document satisfies a boolean keyword expression. Rules for `predicate`: - A term is a word such as `a`, `b`, or `error`. - `term` is true if that word appears anywhere in the document. - Operators supported: `&&` and `||`. - `&&` has higher precedence than `||`. - You may also support parentheses for grouping. Examples: - `a || b || c` is true if the document contains at least one of `a`, `b`, or `c`. - `a && b || c` is true if the document contains both `a` and `b`, or contains `c`. Follow-up 1: - Implement `GetAllFiles(predicate) -> List[filename]`, which returns all documents matching the predicate. Follow-up 2: - Explain how you would scale this service in a distributed system, including a reasonable sharding or partitioning strategy and a replication policy. Assume you are starting from a blank editor or whiteboard and should define any necessary data structures and test cases.

Quick Answer: This question evaluates the ability to design a simple document storage and boolean keyword predicate evaluation, testing parsing of logical expressions with operator precedence and the selection of data structures for efficient term lookup.

Part 1: Implement InsertDoc and CheckContains for Boolean Document Predicates

Design a tiny in-memory document service. You are given a sequence of operations. An ('insert', filename, content) operation stores or overwrites a document. A ('check', filename, predicate) operation returns whether the named document satisfies a boolean predicate. Content is a space-separated list of words and a term is true iff that exact word appears in the document. Predicates contain terms, '&&', '||', and optional parentheses. '&&' has higher precedence than '||'. Return the results of all check operations in order. If a file does not exist, treat it as an empty document.

Constraints

  • 1 <= len(operations) <= 10^4
  • Filenames are non-empty strings.
  • Document content may be empty; otherwise it is whitespace-separated words.
  • Predicates are valid and contain words, '&&', '||', parentheses, and optional spaces.
  • Word matching is exact on whitespace-separated tokens.

Examples

Input: ([('insert', 'doc1', 'a b'), ('check', 'doc1', 'a && b || c'), ('check', 'doc1', 'a && (b || c)'), ('check', 'doc1', 'c || d')],)

Expected Output: [True, True, False]

Explanation: '&&' binds tighter than '||', and parentheses must also be respected.

Input: ([('insert', 'doc1', 'a'), ('check', 'doc1', 'a && b'), ('insert', 'doc1', 'b c'), ('check', 'doc1', 'a || b'), ('check', 'missing', 'a')],)

Expected Output: [False, True, False]

Explanation: Insert overwrites the previous content. A missing file behaves like an empty document.

Input: ([('insert', 'doc2', ''), ('check', 'doc2', 'a||b'), ('insert', 'doc2', 'error'), ('check', 'doc2', '(a&&b)||error')],)

Expected Output: [False, True]

Explanation: This includes an empty-content edge case and a predicate without spaces.

Input: ([('insert', 'doc3', 'alpha beta gamma'), ('check', 'doc3', 'beta'), ('check', 'doc3', 'delta || gamma && alpha')],)

Expected Output: [True, True]

Explanation: A single term should work, and precedence makes 'gamma && alpha' evaluate before the final '||'.

Hints

  1. Store each document as a set of words so checking a single term is O(1).
  2. A recursive-descent parser is a clean way to enforce '&&' precedence over '||' and handle parentheses.

Part 2: Implement GetAllFiles(predicate) to Return Matching Filenames

Extend the document service with a query that returns every filename whose current content matches a boolean predicate. You are given operations of two kinds: ('insert', filename, content) to store or overwrite a document, and ('get', predicate) to return all matching filenames. Content is a space-separated list of words and a term is true iff that exact word appears in a document. Predicates contain terms, '&&', '||', and optional parentheses, with '&&' having higher precedence than '||'. Return the results of all get operations. Each result must be sorted lexicographically.

Constraints

  • 1 <= len(operations) <= 10^4
  • Filenames are strings and inserts overwrite older content for the same filename.
  • Document content may be empty; otherwise it is whitespace-separated words.
  • Predicates are valid and contain words, '&&', '||', parentheses, and optional spaces.
  • Returned filenames for each query must be in ascending lexicographic order.

Examples

Input: ([('insert', 'a.txt', 'red blue'), ('insert', 'b.txt', 'blue green'), ('insert', 'c.txt', 'green'), ('get', 'red || green'), ('get', 'red && blue')],)

Expected Output: [['a.txt', 'b.txt', 'c.txt'], ['a.txt']]

Explanation: The first query matches any file containing red or green. The second matches only files containing both red and blue.

Input: ([('insert', 'x', 'a'), ('insert', 'y', 'b'), ('get', 'a || b'), ('insert', 'x', 'c'), ('get', 'a || b'), ('get', 'c && a')],)

Expected Output: [['x', 'y'], ['y'], []]

Explanation: Overwriting x removes it from the postings list for 'a' and adds it to the postings list for 'c'.

Input: ([('get', 'a'), ('insert', 'z', ''), ('get', 'a || b')],)

Expected Output: [[], []]

Explanation: An empty store and an empty document are both valid edge cases.

Input: ([('insert', 'd1', 'error warning'), ('insert', 'd2', 'warning'), ('insert', 'd3', 'error'), ('get', '(error&&warning)||missing'), ('get', 'error||warning')],)

Expected Output: [['d1'], ['d1', 'd2', 'd3']]

Explanation: Parentheses and predicates without spaces should still parse correctly.

Hints

  1. An inverted index from word -> set of filenames lets a single term evaluate directly to matching documents.
  2. Once terms become sets, '&&' is set intersection and '||' is set union.

Part 3: Simulate a Sharding and Replication Strategy for the Document Service

Model one reasonable distributed design for the document service. Given an ordered list of storage nodes, a number of logical shards, a replication factor, and a list of filenames, assign each file to a shard and then to replica nodes. Use this strategy: stable_hash(filename) = sum(ord(c) for c in filename); shard_id = stable_hash(filename) % num_shards; primary node = nodes[shard_id % len(nodes)]; additional replicas go to the next distinct nodes in circular order until you have replication_factor copies or run out of nodes. Also given a list of failed nodes and a list of filenames to read, route each read to the first non-failed replica for that file, or None if every replica is down or the file was never placed. Return both the placement plan and the chosen read routes.

Constraints

  • 1 <= len(nodes) <= 10^4
  • 1 <= num_shards <= 10^9
  • 1 <= replication_factor
  • Filenames and node names are strings.
  • If replication_factor > len(nodes), use each node at most once.

Examples

Input: (['n1', 'n2', 'n3'], 4, 2, ['a', 'b', 'c'], ['n2'], ['a', 'b', 'c'])

Expected Output: {'placements': {'a': {'shard': 1, 'replicas': ['n2', 'n3']}, 'b': {'shard': 2, 'replicas': ['n3', 'n1']}, 'c': {'shard': 3, 'replicas': ['n1', 'n2']}}, 'routes': {'a': 'n3', 'b': 'n3', 'c': 'n1'}}

Explanation: File 'a' falls on shard 1 whose primary is n2, but n2 is failed so reads fall back to n3.

Input: (['x', 'y'], 3, 5, ['doc'], ['x', 'y'], ['doc', 'missing'])

Expected Output: {'placements': {'doc': {'shard': 1, 'replicas': ['y', 'x']}}, 'routes': {'doc': None, 'missing': None}}

Explanation: Replication factor exceeds node count, so only two distinct replicas are used. Both are down.

Input: (['solo'], 10, 3, ['aa', 'bb'], [], ['aa', 'bb'])

Expected Output: {'placements': {'aa': {'shard': 4, 'replicas': ['solo']}, 'bb': {'shard': 6, 'replicas': ['solo']}}, 'routes': {'aa': 'solo', 'bb': 'solo'}}

Explanation: With one node, every file maps to that node regardless of requested replication factor.

Input: (['nA', 'nB', 'nC', 'nD'], 5, 3, ['d'], ['nA'], ['d'])

Expected Output: {'placements': {'d': {'shard': 0, 'replicas': ['nA', 'nB', 'nC']}}, 'routes': {'d': 'nB'}}

Explanation: The primary replica is down, so the read is routed to the next healthy replica.

Hints

  1. First compute the logical shard for a file, then map that shard to a primary node and walk forward circularly for replicas.
  2. Cap the replica count at the number of nodes so you never place duplicate replicas on the same node.
Last updated: May 5, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • Careers
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Find Shortest Wiki Click Path - Snowflake (medium)
  • Schedule prerequisite classes with retakes - Snowflake (easy)
  • Solve three coding rounds - Snowflake (medium)
  • Minimize coins with overpay and change - Snowflake (hard)
  • Compute effective permissions on DAG and prune tree - Snowflake (medium)