PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates string-processing, text-indexing, approximate string-matching, and data-structure design skills for supporting many repeated queries on the same document.

  • medium
  • Workday
  • Coding & Algorithms
  • Software Engineer

Build a Fuzzy Keyword Index

Company: Workday

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

Implement a class that preprocesses a document and supports repeated keyword lookups. Requirements: - The constructor receives a text string containing lowercase words separated by non-letter characters. - `query(term)` returns the zero-based starting character offsets of every exact whole-word match of `term`, sorted in ascending order. - The data structure should be optimized for many queries against the same text. Follow-up: Add `fuzzy_query(term)` so it also matches whole words that are within one insertion or one deletion of `term`. For example, a query for `cat` should be able to match `cats` and `ct`, but not `cut`. Discuss the data structures you would use and the trade-offs between preprocessing cost, query latency, and memory usage.

Quick Answer: This question evaluates string-processing, text-indexing, approximate string-matching, and data-structure design skills for supporting many repeated queries on the same document.

Part 1: Exact Keyword Index with Starting Offsets

You are given a document string `text` and a list of query terms `queries`. A word is any maximal contiguous block of lowercase letters `a-z`. Any non-letter character is a separator. For each query term, return the zero-based starting character offsets of every exact whole-word match in `text`, sorted in ascending order. Because many queries are asked against the same document, an efficient solution should preprocess the text once and then answer lookups quickly.

Constraints

  • `0 <= len(text) <= 200000`
  • `0 <= len(queries) <= 10000`
  • Each query consists only of lowercase letters `a-z` and may repeat.
  • Words in `text` are lowercase letters `a-z`; every other character is a separator.
  • The answer for each query must be sorted in ascending order.

Examples

Input: ('cat,dog cat!dog', ['cat', 'dog', 'cow'])

Expected Output: [[0, 8], [4, 12], []]

Explanation: The words are `cat` at 0 and 8, and `dog` at 4 and 12. `cow` does not appear.

Input: ('a..ab,a', ['a', 'ab', 'b'])

Expected Output: [[0, 6], [3], []]

Explanation: `b` is not a whole word; it only appears inside `ab`.

Input: ('', ['a', 'b'])

Expected Output: [[], []]

Explanation: Edge case: empty document, so every query returns no matches.

Input: ('one two', [])

Expected Output: []

Explanation: Edge case: no queries.

Hints

  1. Scan the text once and detect word boundaries instead of searching separately for every query.
  2. A hash map from word -> list of starting offsets makes repeated exact lookups fast.

Part 2: Fuzzy Keyword Index with One Insertion or Deletion

You are given a document string `text` and a list of query terms `queries`. A word is any maximal contiguous block of lowercase letters `a-z`. Any non-letter character is a separator. For each query term, return the zero-based starting character offsets of every whole-word match in `text` that satisfies one of these conditions: 1. The document word is exactly equal to the query term. 2. Deleting exactly one character from the document word makes it equal to the query term. 3. Deleting exactly one character from the query term makes it equal to the document word. This means matches are allowed at insertion/deletion distance at most 1, but substitutions do not count. For example, querying `cat` may match `cat`, `cats`, and `ct`, but not `cut`. Because many queries are asked against the same document, your solution should preprocess the text efficiently.

Constraints

  • `0 <= len(text) <= 200000`
  • `0 <= len(queries) <= 10000`
  • Each query is a non-empty lowercase string using only `a-z`.
  • Words in `text` are lowercase letters `a-z`; every other character is a separator.
  • A valid fuzzy match allows only exact match, one insertion, or one deletion. Replacements/substitutions are not allowed.

Examples

Input: ('cat cats ct cut scat at', ['cat'])

Expected Output: [[0, 4, 9, 16, 21]]

Explanation: `cat` matches exactly at 0, `cats` and `scat` by deleting one character from the document word, and `ct` and `at` by deleting one character from the query. `cut` is not included because that would require a substitution.

Input: ('cat cats ct cut scat at', ['cuts', 'cut', 'ca'])

Expected Output: [[12], [9, 12], [0]]

Explanation: `cuts` matches `cut`; `cut` matches itself and `ct`; `ca` matches `cat` because the document word can delete one character to become `ca`.

Input: ('a aa aaa b', ['aa'])

Expected Output: [[0, 2, 5]]

Explanation: Matches include `a`, `aa`, and `aaa`. Repeated deletion signatures should not create duplicate results.

Input: ('', ['cat'])

Expected Output: [[]]

Explanation: Edge case: empty document.

Hints

  1. Keep the exact index from Part 1, but also consider signatures created by deleting one character from a word.
  2. A query can match longer document words via a precomputed deletion-signature map, and shorter document words by generating all one-character deletions of the query itself.
Last updated: May 9, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.