How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Glean.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Glean during technical interviews.

Return Top Department Suggestions | Glean Coding Question

Return Top Department Suggestions

Company: Glean

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

You are given a static in-memory list of `Suggestion` objects loaded by a server: ```text Suggestion { ngram: string, department: int, score: int } ``` Implement: ```text getTopSuggestions(query: string, userDept: int, k: int) -> List[Suggestion] ``` A suggestion is valid if both conditions hold: 1. `query` is a prefix of `suggestion.ngram`. 2. `suggestion.department == userDept`. Return the top `k` valid suggestions sorted by `score` from highest to lowest. If fewer than `k` suggestions are valid, return all valid suggestions. If scores tie, use the original input order as the tie-breaker. The list of suggestions is static, so you may perform one-time preprocessing. However, the original list already consumes about half of the server's available memory, so your solution should prioritize low query latency without using excessive additional memory. Example input list: ```text [ ("software", 1, 5), ("software engineer", 1, 3), ("engineer", 1, 5), ("glean", 2, 12), ("generative ai", 2, 2), ("office", 1, 4) ] ``` Example: ```text getTopSuggestions("soft", 1, 2) ``` Expected result: ```text [("software", 1, 5), ("software engineer", 1, 3)] ``` Design and implement an efficient approach for serving this function repeatedly on the server.

Quick Answer: This question evaluates proficiency in prefix-based search and top-k ranking, familiarity with in-memory data structures and preprocessing strategies, and attention to stability for score ties.

You are given a static list of suggestions, where each suggestion is represented as a tuple `(ngram, department, score)`. In the real service, many queries are run against the same list, so your task is to answer a batch of queries efficiently after one preprocessing step. Implement `solution(suggestions, queries)`. - `suggestions` is the static in-memory list. - `queries` is a list of requests, where each request is `(query, userDept, k)`. A suggestion is valid for a query if: 1. `query` is a prefix of `ngram`. 2. `department == userDept`. For each query, return the top `k` valid suggestions sorted by `score` from highest to lowest. If fewer than `k` suggestions are valid, return all valid suggestions. If scores tie, the suggestion that appeared earlier in the original `suggestions` list must come first. Return each result as a list of the original suggestion tuples. The goal is to support repeated queries with low latency while using only modest extra memory.

Constraints

0 <= len(suggestions) <= 2 * 10^5
0 <= len(queries) <= 2 * 10^5
0 <= sum(len(ngram) for ngram, _, _ in suggestions) <= 10^6
0 <= len(query) <= 100 for each query, and prefix matching is case-sensitive
-10^9 <= score <= 10^9, 0 <= department <= 10^9, and 0 <= k <= len(suggestions)
Original input order means the position of a suggestion in the given `suggestions` list before any preprocessing

Examples

Input: ([('apple', 1, 10), ('app', 1, 7), ('apply', 1, 9), ('apt', 2, 8), ('banana', 1, 6)], [('app', 1, 2), ('ap', 2, 3)])

Expected Output: [[('apple', 1, 10), ('apply', 1, 9)], [('apt', 2, 8)]]

Explanation: For prefix 'app' in department 1, the valid suggestions are ('apple', 1, 10), ('app', 1, 7), and ('apply', 1, 9). The top 2 by score are apple and apply. For prefix 'ap' in department 2, only ('apt', 2, 8) matches.

Input: ([('dog', 1, 8), ('car', 1, 7), ('cart', 1, 7), ('carbon', 1, 7), ('car', 2, 9)], [('car', 1, 2), ('car', 1, 3)])

Expected Output: [[('car', 1, 7), ('cart', 1, 7)], [('car', 1, 7), ('cart', 1, 7), ('carbon', 1, 7)]]

Explanation: Only department 1 suggestions whose ngram starts with 'car' are valid: car, cart, and carbon. All have score 7, so original order breaks ties. 'dog' must not appear because it does not match the prefix.

Input: ([('aa', 1, 5), ('aa', 1, 5), ('aaa', 1, 5), ('aa', 2, 10)], [('aa', 1, 5)])

Expected Output: [[('aa', 1, 5), ('aa', 1, 5), ('aaa', 1, 5)]]

Explanation: The three department 1 matches all have the same score, so their order must follow their original appearance in `suggestions`.

Input: ([], [('a', 1, 3)])

Expected Output: [[]]

Explanation: There are no suggestions, so the only query returns an empty list.

Input: ([('cat', 1, 4)], [('c', 1, 0), ('', 1, 2)])

Expected Output: [[], [('cat', 1, 4)]]

Explanation: If `k` is 0, return an empty list. An empty query string is a prefix of every ngram, so the second query returns the single matching suggestion.

Input: ([('mouse', 3, 2), ('moon', 3, 5), ('mop', 3, 5), ('more', 4, 9)], [('mo', 3, 5)])

Expected Output: [[('moon', 3, 5), ('mop', 3, 5), ('mouse', 3, 2)]]

Explanation: In department 3, the prefix 'mo' matches mouse, moon, and mop. The two score-5 suggestions come first, and their tie is broken by original order.

Input: ([('ant', 1, 3), ('anchor', 2, 9)], [('an', 3, 2), ('anc', 1, 2)])

Expected Output: [[], []]

Explanation: The first query has no suggestions in department 3. The second query uses department 1, but 'ant' does not start with 'anc', and 'anchor' is in the wrong department.

Hints

Because the same suggestion list is used for many queries, avoid scanning the full list from scratch every time.
If you group suggestions by department and sort only their indices by `ngram`, all strings sharing a prefix become contiguous. Then you can binary-search the first possible match and keep only the best `k` results with a small heap.

Quick Overview

This question evaluates proficiency in prefix-based search and top-k ranking, familiarity with in-memory data structures and preprocessing strategies, and attention to stability for score ties.