Implement a resumable data loader
Company: Microsoft
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
Quick Answer: This question evaluates a candidate's ability to design stateful, resumable data iteration with deterministic shuffling and compact serializable checkpointing, exercising skills in state management, reproducibility, and handling large indexable datasets.
Constraints
- The deterministic harness uses saveAfter as the number of batches consumed before saving.
- When shuffle is true, random.Random(seed) defines the order.
- The returned state behavior must match the uninterrupted run.
Examples
Input: (["a","b","c","d","e"], 2, False, 7, 1)
Expected Output: {'before_save': [['a', 'b']], 'after_resume': [['c', 'd'], ['e']], 'uninterrupted': [['a', 'b'], ['c', 'd'], ['e']]}
Explanation: Without shuffle, resume continues at the next sequential batch.
Input: ([0,1,2,3,4,5], 3, True, 42, 1)
Expected Output: {'before_save': [[3, 1, 2]], 'after_resume': [[4, 0, 5]], 'uninterrupted': [[3, 1, 2], [4, 0, 5]]}
Explanation: Shuffle order is deterministic for the seed and preserved in state.
Input: ([1,2], 5, False, 1, 3)
Expected Output: {'before_save': [[1, 2]], 'after_resume': [], 'uninterrupted': [[1, 2]]}
Explanation: A short final batch and over-large save point are handled.
Hints
- Persist the shuffled order and the next batch index.
- Resume by slicing the same saved order.