How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Workday.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Workday during technical interviews.

Build a Fuzzy Keyword Index | Workday Coding Question

Build a Fuzzy Keyword Index

Company: Workday

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

Implement a class that preprocesses a document and supports repeated keyword lookups. Requirements: - The constructor receives a text string containing lowercase words separated by non-letter characters. - `query(term)` returns the zero-based starting character offsets of every exact whole-word match of `term`, sorted in ascending order. - The data structure should be optimized for many queries against the same text. Follow-up: Add `fuzzy_query(term)` so it also matches whole words that are within one insertion or one deletion of `term`. For example, a query for `cat` should be able to match `cats` and `ct`, but not `cut`. Discuss the data structures you would use and the trade-offs between preprocessing cost, query latency, and memory usage.

Quick Answer: This question evaluates string-processing, text-indexing, approximate string-matching, and data-structure design skills for supporting many repeated queries on the same document.

Part 1: Exact Keyword Index with Starting Offsets

Constraints

`0 <= len(text) <= 200000`
`0 <= len(queries) <= 10000`
Each query consists only of lowercase letters `a-z` and may repeat.
Words in `text` are lowercase letters `a-z`; every other character is a separator.
The answer for each query must be sorted in ascending order.

Examples

Input: ('cat,dog cat!dog', ['cat', 'dog', 'cow'])

Expected Output: [[0, 8], [4, 12], []]

Explanation: The words are `cat` at 0 and 8, and `dog` at 4 and 12. `cow` does not appear.

Input: ('a..ab,a', ['a', 'ab', 'b'])

Expected Output: [[0, 6], [3], []]

Explanation: `b` is not a whole word; it only appears inside `ab`.

Input: ('', ['a', 'b'])

Expected Output: [[], []]

Explanation: Edge case: empty document, so every query returns no matches.

Input: ('one two', [])

Expected Output: []

Explanation: Edge case: no queries.

Hints

Scan the text once and detect word boundaries instead of searching separately for every query.
A hash map from word -> list of starting offsets makes repeated exact lookups fast.

Part 2: Fuzzy Keyword Index with One Insertion or Deletion

You are given a document string `text` and a list of query terms `queries`. A word is any maximal contiguous block of lowercase letters `a-z`. Any non-letter character is a separator. For each query term, return the zero-based starting character offsets of every whole-word match in `text` that satisfies one of these conditions: 1. The document word is exactly equal to the query term. 2. Deleting exactly one character from the document word makes it equal to the query term. 3. Deleting exactly one character from the query term makes it equal to the document word. This means matches are allowed at insertion/deletion distance at most 1, but substitutions do not count. For example, querying `cat` may match `cat`, `cats`, and `ct`, but not `cut`. Because many queries are asked against the same document, your solution should preprocess the text efficiently.

Constraints

`0 <= len(text) <= 200000`
`0 <= len(queries) <= 10000`
Each query is a non-empty lowercase string using only `a-z`.
Words in `text` are lowercase letters `a-z`; every other character is a separator.
A valid fuzzy match allows only exact match, one insertion, or one deletion. Replacements/substitutions are not allowed.

Examples

Input: ('cat cats ct cut scat at', ['cat'])

Expected Output: [[0, 4, 9, 16, 21]]

Explanation: `cat` matches exactly at 0, `cats` and `scat` by deleting one character from the document word, and `ct` and `at` by deleting one character from the query. `cut` is not included because that would require a substitution.

Input: ('cat cats ct cut scat at', ['cuts', 'cut', 'ca'])

Expected Output: [[12], [9, 12], [0]]

Explanation: `cuts` matches `cut`; `cut` matches itself and `ct`; `ca` matches `cat` because the document word can delete one character to become `ca`.

Input: ('a aa aaa b', ['aa'])

Expected Output: [[0, 2, 5]]

Explanation: Matches include `a`, `aa`, and `aaa`. Repeated deletion signatures should not create duplicate results.

Input: ('', ['cat'])

Expected Output: [[]]

Explanation: Edge case: empty document.

Hints

Keep the exact index from Part 1, but also consider signatures created by deleting one character from a word.
A query can match longer document words via a precomputed deletion-signature map, and shorter document words by generating all one-character deletions of the query itself.

Quick Overview

This question evaluates string-processing, text-indexing, approximate string-matching, and data-structure design skills for supporting many repeated queries on the same document.