How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a hard difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Roblox.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Roblox during technical interviews.

Rank queries by prefix, frequency, and time

Q: Rank queries by prefix, frequency, and time

This question evaluates understanding of string processing, frequency aggregation, earliest-timestamp tracking, and efficient prefix-based retrieval for large datasets.

Company: Roblox

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

You are given three arrays of the same length `n`: - `queries[i]`: a non-empty string representing the text of the i-th query. - `timestamps[i]`: an integer representing the time when `queries[i]` was issued. - `0 <= i < n`. You are also given an array `prefixes` of length `m`, where each `prefixes[j]` is a non-empty string. For each prefix `p` in `prefixes`, you must find **all distinct query strings** that start with `p` (i.e., `query.startsWith(p)`), then sort those distinct queries by: 1. **Descending frequency**: the number of times that exact query string appears in `queries`. 2. If multiple queries have the same frequency, break ties by **ascending earliest timestamp**: for a given query string `q`, look at all indices `k` where `queries[k] == q` and take `min(timestamps[k])`; the query with the smaller earliest timestamp comes first. Return, for each prefix `prefixes[j]`, the list of matching query strings sorted according to the above rules. Assume: - `1 <= n, m <= 2 * 10^5` (large enough that an \(O(nm)\) solution will time out). - `1 <= len(queries[i]), len(prefixes[j]) <= 50`. - `timestamps[i]` fit in a 32-bit signed integer. You may return the result in any reasonable structure; for example, an array `result` of length `m`, where `result[j]` is a list of strings for prefix `prefixes[j]`. **Example** Input: - `queries = ["apple", "app", "banana", "app", "application"]` - `timestamps = [5, 3, 10, 7, 8]` - `prefixes = ["ap", "app", "b"]` Processing: - For prefix `"ap"`: - Matching queries: `"apple"`, `"app"`, `"app"`, `"application"`. - Distinct query strings and frequencies: - `"app"`: appears 2 times, earliest timestamp = `min(3, 7) = 3`. - `"apple"`: appears 1 time, earliest timestamp = `5`. - `"application"`: appears 1 time, earliest timestamp = `8`. - Sorted: `"app"` (freq 2), then `"apple"` (freq 1, time 5), then `"application"` (freq 1, time 8). - For prefix `"app"`: - Matching queries: `"apple"`, `"app"`, `"app"`, `"application"` (same set as above since all start with "app"). - Sorted order is the same as for `"ap"`. - For prefix `"b"`: - Matching queries: `"banana"` only, with frequency 1 and earliest timestamp 10. Output (conceptually): - For `"ap"`: `["app", "apple", "application"]` - For `"app"`: `["app", "apple", "application"]` - For `"b"`: `["banana"]`

Quick Answer: This question evaluates understanding of string processing, frequency aggregation, earliest-timestamp tracking, and efficient prefix-based retrieval for large datasets.

You are given three arrays of the same length `n`: - `queries[i]`: a non-empty string representing the text of the i-th query. - `timestamps[i]`: an integer representing the time when `queries[i]` was issued. You are also given an array `prefixes` of length `m`, where each `prefixes[j]` is a non-empty string. For each prefix `p` in `prefixes`, find **all distinct query strings** that start with `p` (i.e. `query.startsWith(p)`), then sort those distinct queries by: 1. **Descending frequency** — the number of times that exact query string appears in `queries`. 2. On a frequency tie, **ascending earliest timestamp** — for a query string `q`, take `min(timestamps[k])` over all `k` with `queries[k] == q`; the smaller earliest timestamp comes first. (For full determinism when two distinct queries have equal frequency and equal earliest timestamp, break the remaining tie by the query string in ascending lexicographic order.) Return, for each prefix `prefixes[j]`, the list of matching query strings sorted by the rules above (e.g. an array `result` of length `m` where `result[j]` is the list of strings for `prefixes[j]`). **Example** ``` queries = ["apple", "app", "banana", "app", "application"] timestamps = [5, 3, 10, 7, 8] prefixes = ["ap", "app", "b"] ``` - For `"ap"`: `"app"` (freq 2, earliest min(3,7)=3) ranks first, then `"apple"` (freq 1, time 5), then `"application"` (freq 1, time 8) -> `["app", "apple", "application"]`. - For `"app"`: same matching set -> `["app", "apple", "application"]`. - For `"b"`: only `"banana"` -> `["banana"]`. **Constraints** - `1 <= n, m <= 2 * 10^5` (an O(n*m) brute force may time out). - `1 <= len(queries[i]), len(prefixes[j]) <= 50`. - `timestamps[i]` fit in a 32-bit signed integer.

Constraints

1 <= n, m <= 2 * 10^5
1 <= len(queries[i]), len(prefixes[j]) <= 50
timestamps[i] fit in a 32-bit signed integer
queries[i] and prefixes[j] are non-empty
An O(n*m) brute force may time out at the upper bounds

Examples

Input: (["apple", "app", "banana", "app", "application"], [5, 3, 10, 7, 8], ["ap", "app", "b"])

Expected Output: [["app", "apple", "application"], ["app", "apple", "application"], ["banana"]]

Explanation: The prompt's worked example. "app" has freq 2 (earliest min(3,7)=3) so it leads; "apple" (freq 1, t=5) precedes "application" (freq 1, t=8). Prefix "b" matches only "banana".

Input: (["cat", "car", "cart", "cat", "car", "car"], [10, 1, 5, 2, 8, 3], ["ca", "car", "x"])

Expected Output: [["car", "cat", "cart"], ["car", "cart"], []]

Explanation: Frequencies: car=3, cat=2, cart=1. Earliest: car=min(1,8,3)=1, cat=min(10,2)=2, cart=5. Sorted by descending freq: car, cat, cart. Prefix "car" matches car and cart; prefix "x" matches nothing.

Input: (["a"], [100], ["a"])

Expected Output: [["a"]]

Explanation: Single query, single prefix; "a" starts with "a".

Input: (["dog", "dog", "dot", "dot"], [5, 9, 2, 1], ["do"])

Expected Output: [["dot", "dog"]]

Explanation: Both dog and dot have frequency 2, so the tie is broken by ascending earliest timestamp: dot's earliest is min(2,1)=1 vs dog's min(5,9)=5, so dot comes first.

Input: (["zed", "zen"], [3, 7], ["q"])

Expected Output: [[]]

Explanation: No query starts with "q", so the result list for that prefix is empty.

Input: (["abc", "abd", "abc", "abe"], [-5, -3, -10, 0], ["ab"])

Expected Output: [["abc", "abd", "abe"]]

Explanation: Negative timestamps work fine. abc=freq 2 (earliest min(-5,-10)=-10) leads. abd (freq 1, t=-3) precedes abe (freq 1, t=0) by ascending earliest timestamp.

Hints

First collapse the raw arrays into per-distinct-query aggregates: a frequency count and the minimum (earliest) timestamp for each distinct query string. This is one linear pass.
Notice the sort order (descending frequency, then ascending earliest timestamp) does not depend on the prefix. So sort the distinct queries ONCE globally, then for each prefix just filter the already-sorted list with startsWith — relative order is preserved.
For full determinism when two distinct queries tie on both frequency and earliest timestamp, add the query string itself as a final ascending tiebreaker.
If prefixes are very selective relative to n, a trie keyed on query strings (with each node tracking the best-ranked queries beneath it via a heap) can beat the linear filter, but the global-sort-then-filter approach already handles the given constraints.

Quick Overview

This question evaluates understanding of string processing, frequency aggregation, earliest-timestamp tracking, and efficient prefix-based retrieval for large datasets.