You are building a data-labeling platform. Each day, a batch of task IDs arrives. A task ID may reappear on later days because the platform may request additional independent labels for the same underlying item. For every task in the current day, assign exactly one human labeler and one model.
Implement an incremental scheduler with persistent state across days, for example:
class Scheduler:
def __init__(self, humans: list[str], models: list[str]): ...
def schedule(self, tasks: list[str]) -> list[tuple[str, str, str]]: ...
Return one (task_id, human_id, model_id) triple per task.
Requirements:
-
The sets of humans and models are fixed at initialization.
-
Each daily batch contains distinct task IDs.
-
Each human can handle at most one task per day.
-
Models have no daily capacity limit.
-
The same human must never be assigned the same task ID more than once across all days.
-
Historical assignments remain in effect;
schedule()
must update state incrementally instead of recomputing all previous days from scratch.
-
The long-run distribution should stay as balanced as possible across humans and across models.
To make the problem deterministic, among all valid assignments prefer the one that:
-
minimizes the maximum difference in total assignments between any two humans,
-
then minimizes the maximum difference in total assignments between any two models,
-
then uses lexicographically smaller human IDs and model IDs as tie-breakers.
Assume len(tasks) <= len(humans) for every day. Discuss the data structures needed to support efficient incremental updates over many days.