Implement a resumable data loader

Q: Implement a resumable data loader

This is a Coding & Algorithms interview question from Microsoft for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Loading...

Problem: Resumable DataLoader

You are implementing a mini data-loading component for model training.

Design a ResumableDataLoader that iterates over a dataset and yields mini-batches, but can also save its state and later resume from exactly where it left off.

Requirements

The dataset is an indexable collection dataset[0..N-1] .
The loader yields batches of size B as lists of dataset items (or indices).
Supports:
- shuffle=True/False .
- Deterministic behavior given a seed .
Provide APIs (language-agnostic):
- __iter__() / next() (or equivalent) to iterate batches.
- state_dict() → returns a serializable object capturing everything needed to resume.
- load_state_dict(state) → restores the loader to continue iteration.

Resume correctness

After saving state mid-epoch and restoring, the sequence of items produced must be identical to an uninterrupted run.

Clarifications to address in your design

How do you handle the end of an epoch?
If shuffle=True , how do you ensure the shuffle order is reproducible across resume?
What happens when the last batch is smaller than B ?

Constraints

Assume N can be large; avoid storing unnecessary full copies of the dataset.
State must be reasonably small and serializable (e.g., JSON/pickle equivalent).