PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates simulation of parallel worker scheduling, time-based queue processing, modular assignment logic, and aggregation/ranking (top-k and percentage) competencies for tracking per-worker workloads.

  • hard
  • Glean
  • Coding & Algorithms
  • Software Engineer

Simulate document assignment to indexers

Company: Glean

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: hard

Interview Round: Technical Screen

You are given a system with `m` parallel indexers (workers), numbered from `0` to `m-1`. Documents arrive over time and are placed in a processing queue. Each document, when processed by an indexer, occupies that indexer exclusively for a given processing duration. Document assignment works as follows (assume 0-based indexing for documents): 1. Document `i` arrives at time `queue_time[i]` and requires `processing_time[i]` units of processing time. 2. Let `base = i % m`. First, try to assign the document to indexer `base`. 3. An indexer `j` is **available** for a document arriving at time `t` if its current processing (if any) completes no later than time `t`. 4. If indexer `base` is **available** at time `queue_time[i]`, assign the document to indexer `base`. 5. Otherwise, scan indexers in increasing order of indexer number, wrapping around modulo `m`: `(base + 1) % m`, `(base + 2) % m`, ..., `(base + m - 1) % m`. Assign the document to the **first** indexer found that is available at `queue_time[i]`. 6. If **no** indexer is available at `queue_time[i]`, then this document is **dropped** and never processed. 7. Once a document `i` is assigned to indexer `j` at time `queue_time[i]`, that indexer becomes busy until time `queue_time[i] + processing_time[i]`. You are given: - An integer `m` — the number of indexers. - An integer array `queue_time` of length `n`, where `queue_time[i]` is the time when document `i` is queued. The array is in strictly increasing order (i.e., `queue_time[i] < queue_time[i+1]`). - An integer array `processing_time` of length `n`, where `processing_time[i]` is the duration needed to process document `i`. Assume `n = len(queue_time) = len(processing_time)`. **Step 1** Design an algorithm that, using the above scheduling rules, computes: 1. The total number of documents that are processed successfully (i.e., not dropped). 2. The indexer that processed the most documents. If multiple indexers are tied for the maximum number of processed documents, you may return any one of them. **Step 2** Extend your algorithm to also compute, for a given integer `k` (where `1 \le k \le m`): 1. The set of the top `k` indexers that processed the largest number of documents. If there are ties, any order or any valid choice among tied indexers is acceptable. 2. The percentage of all documents (including those dropped) that were processed by these top `k` indexers, as a value between 0% and 100%. You should: - Describe the algorithm to perform the simulation and compute all required outputs. - Aim for an efficient solution in terms of time and space complexity. --- **Example** Input: - `m = 3` - `queue_time = [1, 2, 3, 7]` - `processing_time = [5, 4, 3, 2]` - Suppose `k = 2` for Step 2. Processing simulation: - Document 0 arrives at time 1, `base = 0`. All indexers are free. Assign to indexer 0, which will be busy until time `1 + 5 = 6`. - Document 1 arrives at time 2, `base = 1`. Indexer 1 is free. Assign to indexer 1, busy until `2 + 4 = 6`. - Document 2 arrives at time 3, `base = 2`. Indexer 2 is free. Assign to indexer 2, busy until `3 + 3 = 6`. - Document 3 arrives at time 7, `base = 0`. Indexer 0 is free (available from time 6). Assign to indexer 0, busy until `7 + 2 = 9`. All documents are successfully processed. Indexer 0 processed 2 documents; indexers 1 and 2 each processed 1 document. Output: - Documents processed successfully: `4` - Indexer that processed most documents: `0` - Top 2 indexers (any valid order with ties): e.g., `[0, 1]` or `[0, 2]` - Percentage of documents processed by these 2 indexers: `75%` (3 out of 4 documents)

Quick Answer: This question evaluates simulation of parallel worker scheduling, time-based queue processing, modular assignment logic, and aggregation/ranking (top-k and percentage) competencies for tracking per-worker workloads.

Simulate Document Assignment to Indexers — Step 1

You are given a system with `m` parallel indexers (workers), numbered `0` to `m-1`. Documents arrive over time and are assigned to indexers under a modulo-with-wraparound scheduling rule. For document `i` (0-based), let `base = i % m`. An indexer `j` is **available** for a document arriving at time `t` if its current processing (if any) completes no later than `t`. Try indexer `base` first; if busy, scan `(base+1) % m, (base+2) % m, …` in increasing order with wraparound and pick the **first** available indexer. If no indexer is available, the document is **dropped**. When assigned to indexer `j` at time `queue_time[i]`, that indexer is busy until `queue_time[i] + processing_time[i]`. `queue_time` is strictly increasing and `len(queue_time) == len(processing_time) == n`. Return a list `[processed, busiest]` where: 1. `processed` = total number of documents processed successfully (not dropped). 2. `busiest` = the indexer that processed the most documents. To make the answer deterministic, return the **smallest-numbered** indexer among those tied for the maximum count. Example: `m = 3`, `queue_time = [1,2,3,7]`, `processing_time = [5,4,3,2]` → `[4, 0]` (all 4 processed; indexer 0 handled documents 0 and 3).

Constraints

  • 1 <= m <= 10^4
  • 0 <= n <= 10^5 where n = len(queue_time) = len(processing_time)
  • queue_time is strictly increasing: queue_time[i] < queue_time[i+1]
  • 0 <= queue_time[i], 1 <= processing_time[i]
  • An indexer is available at time t iff its current job completes no later than t (free_at[j] <= t).

Examples

Input: (3, [1, 2, 3, 7], [5, 4, 3, 2])

Expected Output: [4, 0]

Explanation: All 4 documents processed. Indexer 0 handles doc 0 (busy 1..6) and doc 3 (arrives t=7, free since 6), so it processes 2; indexers 1 and 2 process 1 each. Busiest = 0.

Input: (1, [1, 2], [10, 5])

Expected Output: [1, 0]

Explanation: Doc 0 occupies the only indexer until time 11. Doc 1 arrives at t=2 < 11, no indexer available, so it is dropped. processed=1, busiest=0.

Input: (2, [1, 2, 3, 13], [10, 10, 5, 1])

Expected Output: [3, 1]

Explanation: Doc0→idx0 (busy until 11), doc1→idx1 (busy until 12), doc2 (t=3) finds both busy → dropped, doc3 (t=13) → idx1 free since 12. Counts: idx0=1, idx1=2 → busiest=1, processed=3.

Input: (2, [], [])

Expected Output: [0, 0]

Explanation: No documents. processed=0, and with all counts 0 the smallest-index tie-break gives busiest=0.

Input: (1, [5], [3])

Expected Output: [1, 0]

Explanation: Single document on a single indexer is always accepted. processed=1, busiest=0.

Input: (3, [1, 2, 3, 4, 5, 6], [100, 100, 100, 1, 1, 1])

Expected Output: [3, 0]

Explanation: Docs 0,1,2 occupy indexers 0,1,2 until ~101+. Docs 3,4,5 (t=4,5,6) find all three indexers busy and are dropped. processed=3, each indexer has 1 → busiest=0 (smallest among ties).

Hints

  1. Track a single value per indexer: the time it becomes free (busy_until). Initialize to 0 so every indexer is free at the start.
  2. For document i, compute base = i % m and scan offsets 0..m-1 using (base + offset) % m; assign to the first indexer whose busy_until <= queue_time[i].
  3. If the scan finds no available indexer, the document is dropped — do not increment any count.
  4. For a deterministic 'busiest' answer, keep the smallest index when counts tie (iterate left to right and only update on strictly greater count).

Simulate Document Assignment to Indexers — Step 2 (Top-k & Coverage)

Same scheduling simulation as Step 1: `m` indexers, modulo-with-wraparound assignment, drop a document when no indexer is available at its arrival time (an indexer `j` is available for arrival time `t` iff its current job completes no later than `t`). Given an additional integer `k` (`1 <= k <= m`), after running the simulation return a list `[topk, percentage]` where: 1. `topk` = the `k` indexers that processed the most documents. To make the answer deterministic, rank indexers by `(document_count descending, index ascending)`, take the first `k`, and return them **sorted in ascending order**. 2. `percentage` = the percentage of **all** documents (including dropped ones, i.e. out of `n`) that were processed by those top-`k` indexers, a float in `[0, 100]` rounded to 2 decimals. If `n == 0`, the percentage is `0.0`. Example: `m = 3`, `queue_time = [1,2,3,7]`, `processing_time = [5,4,3,2]`, `k = 2` → `[[0, 1], 75.0]` (indexers 0 and 1 processed 3 of the 4 documents).

Constraints

  • 1 <= m <= 10^4
  • 1 <= k <= m
  • 0 <= n <= 10^5 where n = len(queue_time) = len(processing_time)
  • queue_time is strictly increasing
  • Percentage is computed over all n documents (including dropped) and rounded to 2 decimals; n == 0 yields 0.0.

Examples

Input: (3, [1, 2, 3, 7], [5, 4, 3, 2], 2)

Expected Output: [[0, 1], 75.0]

Explanation: Counts: idx0=2, idx1=1, idx2=1. Ranked (-count, index): [0,1,2]; top 2 = {0,1}. They processed 3 of 4 documents → 75.0%.

Input: (3, [1, 2, 3, 7], [5, 4, 3, 2], 1)

Expected Output: [[0], 50.0]

Explanation: Top 1 is indexer 0 with 2 documents → 2/4 = 50.0%.

Input: (3, [1, 2, 3, 7], [5, 4, 3, 2], 3)

Expected Output: [[0, 1, 2], 100.0]

Explanation: All 3 indexers selected; together they processed all 4 documents → 100.0%.

Input: (1, [1, 2], [10, 5], 1)

Expected Output: [[0], 50.0]

Explanation: Doc 1 is dropped (single indexer busy until 11). Indexer 0 processed 1 of 2 documents → 50.0%.

Input: (2, [1, 2, 3, 13], [10, 10, 5, 1], 1)

Expected Output: [[1], 50.0]

Explanation: Counts: idx0=1, idx1=2. Top 1 = indexer 1 → 2 of 4 documents → 50.0%.

Input: (2, [], [], 1)

Expected Output: [[0], 0.0]

Explanation: No documents: all counts 0, ranking by (-count, index) gives [0,1]; top 1 = {0}. With n=0 the percentage is defined as 0.0.

Hints

  1. Reuse the Step 1 simulation to obtain the per-indexer processed counts; only the post-processing changes.
  2. Rank indexers deterministically with sort key (-count, index) so ties break toward smaller indices, then take the first k.
  3. Return the chosen k indices in ascending order so the output is stable regardless of internal ranking order.
  4. The denominator for the percentage is n (all documents that arrived), NOT the number successfully processed — dropped documents still count against coverage.
Last updated: Jun 26, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement a Byte Pair Encoding (BPE) Tokenizer - Glean (medium)
  • Return Top Department Suggestions - Glean (medium)
  • Implement Rate-Limited Wikipedia Crawler - Glean (medium)
  • Find Earliest Train Route - Glean (medium)
  • Search Words in a Character Grid - Glean (hard)