Return Top Department Suggestions
Company: Glean
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates proficiency in prefix-based search and top-k ranking, familiarity with in-memory data structures and preprocessing strategies, and attention to stability for score ties.
Constraints
- 0 <= len(suggestions) <= 2 * 10^5
- 0 <= len(queries) <= 2 * 10^5
- 0 <= sum(len(ngram) for ngram, _, _ in suggestions) <= 10^6
- 0 <= len(query) <= 100 for each query, and prefix matching is case-sensitive
- -10^9 <= score <= 10^9, 0 <= department <= 10^9, and 0 <= k <= len(suggestions)
- Original input order means the position of a suggestion in the given `suggestions` list before any preprocessing
Examples
Input: ([('apple', 1, 10), ('app', 1, 7), ('apply', 1, 9), ('apt', 2, 8), ('banana', 1, 6)], [('app', 1, 2), ('ap', 2, 3)])
Expected Output: [[('apple', 1, 10), ('apply', 1, 9)], [('apt', 2, 8)]]
Explanation: For prefix 'app' in department 1, the valid suggestions are ('apple', 1, 10), ('app', 1, 7), and ('apply', 1, 9). The top 2 by score are apple and apply. For prefix 'ap' in department 2, only ('apt', 2, 8) matches.
Input: ([('dog', 1, 8), ('car', 1, 7), ('cart', 1, 7), ('carbon', 1, 7), ('car', 2, 9)], [('car', 1, 2), ('car', 1, 3)])
Expected Output: [[('car', 1, 7), ('cart', 1, 7)], [('car', 1, 7), ('cart', 1, 7), ('carbon', 1, 7)]]
Explanation: Only department 1 suggestions whose ngram starts with 'car' are valid: car, cart, and carbon. All have score 7, so original order breaks ties. 'dog' must not appear because it does not match the prefix.
Input: ([('aa', 1, 5), ('aa', 1, 5), ('aaa', 1, 5), ('aa', 2, 10)], [('aa', 1, 5)])
Expected Output: [[('aa', 1, 5), ('aa', 1, 5), ('aaa', 1, 5)]]
Explanation: The three department 1 matches all have the same score, so their order must follow their original appearance in `suggestions`.
Input: ([], [('a', 1, 3)])
Expected Output: [[]]
Explanation: There are no suggestions, so the only query returns an empty list.
Input: ([('cat', 1, 4)], [('c', 1, 0), ('', 1, 2)])
Expected Output: [[], [('cat', 1, 4)]]
Explanation: If `k` is 0, return an empty list. An empty query string is a prefix of every ngram, so the second query returns the single matching suggestion.
Input: ([('mouse', 3, 2), ('moon', 3, 5), ('mop', 3, 5), ('more', 4, 9)], [('mo', 3, 5)])
Expected Output: [[('moon', 3, 5), ('mop', 3, 5), ('mouse', 3, 2)]]
Explanation: In department 3, the prefix 'mo' matches mouse, moon, and mop. The two score-5 suggestions come first, and their tie is broken by original order.
Input: ([('ant', 1, 3), ('anchor', 2, 9)], [('an', 3, 2), ('anc', 1, 2)])
Expected Output: [[], []]
Explanation: The first query has no suggestions in department 3. The second query uses department 1, but 'ant' does not start with 'anc', and 'anchor' is in the wrong department.
Hints
- Because the same suggestion list is used for many queries, avoid scanning the full list from scratch every time.
- If you group suggestions by department and sort only their indices by `ngram`, all strings sharing a prefix become contiguous. Then you can binary-search the first possible match and keep only the best `k` results with a small heap.