PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's ability to design and implement in-memory text indexes and related data structures, covering inverted indexing, tokenization and case-insensitive word matching, update/deletion handling, and analysis of time and space complexity.

  • hard
  • Amazon
  • Coding & Algorithms
  • Software Engineer

Build a Searchable Sentence Index

Company: Amazon

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Onsite

Design and implement an in-memory data structure for a collection of sentences. Each sentence has a unique integer id and a text body. Support the following operations: addSentence(id, text), search(word), and deleteSentence(id). search(word) should return the ids of all non-deleted sentences containing that word, sorted in ascending order. Treat matching as case-insensitive, split words on non-alphanumeric characters, and count a word at most once per sentence. Deleting a sentence must update the index so future searches no longer return it. Discuss the time and space complexity of each operation.

Quick Answer: This question evaluates a candidate's ability to design and implement in-memory text indexes and related data structures, covering inverted indexing, tokenization and case-insensitive word matching, update/deletion handling, and analysis of time and space complexity.

You are given a sequence of operations to run on an in-memory index of sentences. Each sentence has an integer id and a text body. Implement the index so that words are matched case-insensitively, words are defined as maximal contiguous alphanumeric sequences, and each word counts at most once per sentence even if it appears multiple times. Support three operations: addSentence(id, text), search(word), and deleteSentence(id). A search should return the ids of all currently non-deleted sentences containing that word, sorted in ascending order. Deleting a sentence must remove all of its contributions from future searches. If deleteSentence is called for an id that is not present, it should do nothing. For this problem, return the results of all search operations in the order they appear.

Constraints

  • 1 <= len(operations) <= 200000
  • The total number of characters across all added texts and search words is at most 10^6
  • -2^31 <= id <= 2^31 - 1
  • operations and arguments have the same length

Examples

Input: (['addSentence', 'addSentence', 'search', 'deleteSentence', 'search', 'search'], [(2, 'Hello, world! hello'), (1, 'World of code.'), ('world',), (2,), ('hello',), ('world',)])

Expected Output: [[1, 2], [], [1]]

Explanation: Matching is case-insensitive, 'hello' in sentence 2 is counted once, ids are returned sorted, and after deleting sentence 2 only sentence 1 still matches 'world'.

Input: (['addSentence', 'addSentence', 'search', 'search'], [(10, 'red-blue red'), (3, 'Blue? green; red!'), ('red',), ('blue',)])

Expected Output: [[3, 10], [3, 10]]

Explanation: Non-alphanumeric characters split words, so 'red-blue' contributes both 'red' and 'blue'. Sentence 10 still appears only once per matching word.

Input: (['addSentence', 'search', 'deleteSentence', 'search'], [(5, ''), ('anything',), (5,), ('anything',)])

Expected Output: [[], []]

Explanation: An empty sentence contributes no indexed words, so searches return empty results before and after deletion.

Input: (['addSentence', 'addSentence', 'search', 'search', 'deleteSentence', 'search'], [(7, 'Version2 is better than v1.'), (4, 'v1, V2, and version2'), ('version2',), ('v1',), (4,), ('v2',)])

Expected Output: [[4, 7], [4, 7], []]

Explanation: Alphanumeric tokens can contain digits. After deleting sentence 4, the word 'v2' is no longer present in any sentence.

Input: (['addSentence', 'addSentence', 'search', 'deleteSentence', 'search'], [(-1, 'Alpha beta'), (2, 'beta gamma'), ('BETA',), (-1,), ('alpha',)])

Expected Output: [[-1, 2], []]

Explanation: Ids may be negative, search is case-insensitive, and results must still be sorted in ascending numeric order.

Hints

  1. An inverted index maps each normalized word to the sentence ids that contain it.
  2. To delete efficiently, also store for each sentence id the set of distinct normalized words that were extracted from that sentence.
Last updated: May 30, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement Datacenter Router Commands - Amazon (hard)
  • Replace Delimited Tokens in a String - Amazon (medium)
  • Minimize Circular Redistribution Cost - Amazon (medium)
  • Find the Most Common Visit Pattern - Amazon (hard)
  • Maximize Value Under a Budget - Amazon (medium)