Schedule Incremental Labeling Tasks

Q: Schedule Incremental Labeling Tasks

This question evaluates skills in designing stateful incremental schedulers, fairness and load-balancing algorithms, deterministic tie-breaking, and efficient data structures for persistent assignment tracking within the coding and algorithms domain for machine learning engineering roles.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Q: What difficulty level is this interview question?

This is a hard difficulty Coding & Algorithms question, commonly asked during Onsite rounds at OpenAI.

Q: What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at OpenAI during technical interviews.

Question

Loading...

You are building a data-labeling platform. Each day, a batch of task IDs arrives. A task ID may reappear on later days because the platform may request additional independent labels for the same underlying item. For every task in the current day, assign exactly one human labeler and one model.

Implement an incremental scheduler with persistent state across days, for example:

class Scheduler: def __init__(self, humans: list[str], models: list[str]): ... def schedule(self, tasks: list[str]) -> list[tuple[str, str, str]]: ...

Return one (task_id, human_id, model_id) triple per task.

Requirements:

The sets of humans and models are fixed at initialization.
Each daily batch contains distinct task IDs.
Each human can handle at most one task per day.
Models have no daily capacity limit.
The same human must never be assigned the same task ID more than once across all days.
Historical assignments remain in effect; schedule() must update state incrementally instead of recomputing all previous days from scratch.
The long-run distribution should stay as balanced as possible across humans and across models.

To make the problem deterministic, among all valid assignments prefer the one that:

minimizes the maximum difference in total assignments between any two humans,
then minimizes the maximum difference in total assignments between any two models,
then uses lexicographically smaller human IDs and model IDs as tie-breakers.

Assume len(tasks) <= len(humans) for every day. Discuss the data structures needed to support efficient incremental updates over many days.

Schedule Incremental Labeling Tasks

Quick Overview

Comments (0)