How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a hard difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Datadog.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Datadog during technical interviews.

Implement Prefix Match Filter | Datadog Coding Question

Implement Prefix Match Filter

Company: Datadog

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Onsite

Design a data structure that stores a collection of prefix strings. Then, given a list of input strings, return every string that begins with at least one stored prefix. Requirements: - Support `addPrefix(prefix)` to store a prefix. - Support checking whether a given string matches any stored prefix. - Given a list of strings, output each matching string exactly once, even if multiple prefixes match it. Example: - Stored prefixes: `["ab", "cat", "pre"]` - Input strings: `["abc", "dog", "prefix", "cater", "zoo"]` - Output: `["abc", "prefix", "cater"]` Follow-up: 1. First discuss or implement a straightforward hash-based approach. 2. Then optimize the solution using a trie (prefix tree). 3. Compare the time and space complexity of both approaches.

Quick Answer: This question evaluates understanding of prefix-based string matching, data structure design (hash-based sets and trie/prefix tree), and algorithmic competencies including efficient in-memory storage and deduplication of matches.

Part 1: Hash-Based Prefix Match Filter

Implement a straightforward prefix filter using a hash set. Store all prefixes in a set, then for each input string generate its prefixes from shortest to longest and stop as soon as one stored prefix is found. Return each matching string exactly once, in the order of its first appearance in the input list. If the empty prefix '' is stored, every input string matches.

Constraints

0 <= len(prefixes), len(strings) <= 2000
Each string length is between 0 and 200
Strings contain lowercase English letters and may be empty
A matching string should appear only once in the output even if it appears multiple times in the input

Examples

Input: (['ab', 'cat', 'pre'], ['abc', 'dog', 'prefix', 'cater', 'zoo'])

Expected Output: ['abc', 'prefix', 'cater']

Explanation: 'abc' starts with 'ab', 'prefix' starts with 'pre', and 'cater' starts with 'cat'.

Input: (['a', 'ab'], ['ab', 'abacus', 'ab', 'b'])

Expected Output: ['ab', 'abacus']

Explanation: Both 'ab' and 'abacus' match. The second 'ab' is omitted because output strings must be unique.

Input: ([''], ['', 'x', 'abc', 'x'])

Expected Output: ['', 'x', 'abc']

Explanation: The empty prefix matches every string, and duplicates in the input are returned once.

Input: ([], ['a', ''])

Expected Output: []

Explanation: With no stored prefixes, no string can match.

Hints

A string matches if any prefix of that string is present in a hash set of stored prefixes.
Handle the empty stored prefix carefully: it matches every string, including the empty string.

Part 2: Trie-Based Prefix Match Filter

Implement an optimized prefix filter using a trie (prefix tree). Insert all stored prefixes into the trie, then scan each input string character by character. As soon as you reach a trie node marked as the end of a stored prefix, the string matches. Return each matching string exactly once, in the order of its first appearance. If the empty prefix '' is stored, every input string matches immediately.

Constraints

0 <= len(prefixes), len(strings) <= 10000
The total number of characters across all prefixes and strings is at most 2 * 10^5
Strings contain lowercase English letters and may be empty
A matching string should appear only once in the output

Examples

Input: (['ab', 'cat', 'pre'], ['abc', 'dog', 'prefix', 'cater', 'zoo'])

Expected Output: ['abc', 'prefix', 'cater']

Explanation: The trie allows early stopping as soon as a terminal prefix node is reached.

Input: (['a', 'ab', 'abc'], ['abc', 'abd', 'b', 'abc'])

Expected Output: ['abc', 'abd']

Explanation: Both 'abc' and 'abd' match because they start with stored prefix 'a'. The repeated 'abc' is returned once.

Input: ([''], ['', 'hello', 'hello'])

Expected Output: ['', 'hello']

Explanation: The empty prefix matches every string.

Input: (['car', 'dog'], [])

Expected Output: []

Explanation: An empty input list produces an empty output list.

Input: (['x'], ['', 'x', 'xy'])

Expected Output: ['x', 'xy']

Explanation: The empty string does not match unless the empty prefix is stored.

Hints

Each trie node should track whether a stored prefix ends at that node.
The root itself can represent the empty prefix.

Part 3: Compare Hash-Based and Trie-Based Prefix Filter Costs

Given stored prefixes and input strings, compare the two approaches using a concrete cost model. Use distinct stored prefixes for storage costs. Hash storage cost is the total number of characters across distinct stored prefixes. Trie storage cost is the number of trie nodes created, excluding the root, so shared prefix characters are stored once. For query time: if the empty prefix '' is stored, both methods take 0 time for every string. Otherwise, the hash method generates s[:1], s[:2], ... until the first stored prefix is found or the string ends; creating s[:k] costs k units, so a first match at length k costs 1 + 2 + ... + k, and no match costs 1 + 2 + ... + len(s). The trie method inspects characters until it reaches a terminal prefix node or a missing edge; each inspected character costs 1 unit. Return a dictionary describing both total query times and both storage costs.

Constraints

0 <= len(prefixes), len(strings) <= 10000
The total number of characters across all prefixes and strings is at most 2 * 10^5
Strings contain lowercase English letters and may be empty
Duplicate prefixes do not increase storage cost because storage is measured on distinct stored prefixes

Examples

Input: (['ab', 'cat', 'pre'], ['abc', 'dog', 'prefix', 'cater', 'zoo'])

Expected Output: {'hash_time': 27, 'trie_time': 10, 'hash_space': 8, 'trie_space': 8, 'faster_time': 'trie', 'smaller_space': 'tie'}

Explanation: The trie stops after 2 or 3 inspected characters on matching words and after 1 inspected character on obvious mismatches, while the hash method pays for all generated prefixes.

Input: (['a', 'ab', 'abc', 'ab'], ['abc', 'abd', 'b', ''])

Expected Output: {'hash_time': 3, 'trie_time': 3, 'hash_space': 6, 'trie_space': 3, 'faster_time': 'tie', 'smaller_space': 'trie'}

Explanation: Shortest stored prefix 'a' matches both 'abc' and 'abd' immediately. The trie saves space by sharing nodes.

Input: (['', 'cat'], ['', 'dog', 'cat'])

Expected Output: {'hash_time': 0, 'trie_time': 0, 'hash_space': 3, 'trie_space': 3, 'faster_time': 'tie', 'smaller_space': 'tie'}

Explanation: The empty prefix makes every query an immediate match in both approaches.

Input: ([], ['a', 'bb'])

Expected Output: {'hash_time': 4, 'trie_time': 2, 'hash_space': 0, 'trie_space': 0, 'faster_time': 'trie', 'smaller_space': 'tie'}

Explanation: With no stored prefixes, the trie fails on the first character of each non-empty string, but the hash method still pays for every generated prefix.

Input: (['car', 'cat', 'can'], ['cap', 'dog', 'c'])

Expected Output: {'hash_time': 13, 'trie_time': 5, 'hash_space': 9, 'trie_space': 5, 'faster_time': 'trie', 'smaller_space': 'trie'}

Explanation: The trie shares the 'ca' path across all three prefixes and also stops early on mismatches.

Hints

A trie can tell you the earliest matching prefix length for a string and also where the first mismatch happens.
If the first hash-based match occurs at length k, the total work is the triangular number k * (k + 1) // 2.

Quick Overview

This question evaluates understanding of prefix-based string matching, data structure design (hash-based sets and trie/prefix tree), and algorithmic competencies including efficient in-memory storage and deduplication of matches.