Simulate document assignment to indexers
Company: Glean
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: hard
Interview Round: Technical Screen
You are given a system with `m` parallel indexers (workers), numbered from `0` to `m-1`. Documents arrive over time and are placed in a processing queue. Each document, when processed by an indexer, occupies that indexer exclusively for a given processing duration.
Document assignment works as follows (assume 0-based indexing for documents):
1. Document `i` arrives at time `queue_time[i]` and requires `processing_time[i]` units of processing time.
2. Let `base = i % m`. First, try to assign the document to indexer `base`.
3. An indexer `j` is **available** for a document arriving at time `t` if its current processing (if any) completes no later than time `t`.
4. If indexer `base` is **available** at time `queue_time[i]`, assign the document to indexer `base`.
5. Otherwise, scan indexers in increasing order of indexer number, wrapping around modulo `m`: `(base + 1) % m`, `(base + 2) % m`, ..., `(base + m - 1) % m`. Assign the document to the **first** indexer found that is available at `queue_time[i]`.
6. If **no** indexer is available at `queue_time[i]`, then this document is **dropped** and never processed.
7. Once a document `i` is assigned to indexer `j` at time `queue_time[i]`, that indexer becomes busy until time `queue_time[i] + processing_time[i]`.
You are given:
- An integer `m` — the number of indexers.
- An integer array `queue_time` of length `n`, where `queue_time[i]` is the time when document `i` is queued. The array is in strictly increasing order (i.e., `queue_time[i] < queue_time[i+1]`).
- An integer array `processing_time` of length `n`, where `processing_time[i]` is the duration needed to process document `i`.
Assume `n = len(queue_time) = len(processing_time)`.
**Step 1**
Design an algorithm that, using the above scheduling rules, computes:
1. The total number of documents that are processed successfully (i.e., not dropped).
2. The indexer that processed the most documents. If multiple indexers are tied for the maximum number of processed documents, you may return any one of them.
**Step 2**
Extend your algorithm to also compute, for a given integer `k` (where `1 \le k \le m`):
1. The set of the top `k` indexers that processed the largest number of documents. If there are ties, any order or any valid choice among tied indexers is acceptable.
2. The percentage of all documents (including those dropped) that were processed by these top `k` indexers, as a value between 0% and 100%.
You should:
- Describe the algorithm to perform the simulation and compute all required outputs.
- Aim for an efficient solution in terms of time and space complexity.
---
**Example**
Input:
- `m = 3`
- `queue_time = [1, 2, 3, 7]`
- `processing_time = [5, 4, 3, 2]`
- Suppose `k = 2` for Step 2.
Processing simulation:
- Document 0 arrives at time 1, `base = 0`. All indexers are free. Assign to indexer 0, which will be busy until time `1 + 5 = 6`.
- Document 1 arrives at time 2, `base = 1`. Indexer 1 is free. Assign to indexer 1, busy until `2 + 4 = 6`.
- Document 2 arrives at time 3, `base = 2`. Indexer 2 is free. Assign to indexer 2, busy until `3 + 3 = 6`.
- Document 3 arrives at time 7, `base = 0`. Indexer 0 is free (available from time 6). Assign to indexer 0, busy until `7 + 2 = 9`.
All documents are successfully processed. Indexer 0 processed 2 documents; indexers 1 and 2 each processed 1 document.
Output:
- Documents processed successfully: `4`
- Indexer that processed most documents: `0`
- Top 2 indexers (any valid order with ties): e.g., `[0, 1]` or `[0, 2]`
- Percentage of documents processed by these 2 indexers: `75%` (3 out of 4 documents)
Quick Answer: This question evaluates simulation of parallel worker scheduling, time-based queue processing, modular assignment logic, and aggregation/ranking (top-k and percentage) competencies for tracking per-worker workloads.