Problem: Resumable DataLoader
You are implementing a mini data-loading component for model training.
Design a ResumableDataLoader that iterates over a dataset and yields mini-batches, but can also save its state and later resume from exactly where it left off.
Requirements
-
The dataset is an indexable collection
dataset[0..N-1]
.
-
The loader yields batches of size
B
as lists of dataset items (or indices).
-
Supports:
-
shuffle=True/False
.
-
Deterministic behavior given a
seed
.
-
Provide APIs (language-agnostic):
-
__iter__()
/
next()
(or equivalent) to iterate batches.
-
state_dict()
→ returns a serializable object capturing everything needed to resume.
-
load_state_dict(state)
→ restores the loader to continue iteration.
Resume correctness
After saving state mid-epoch and restoring, the sequence of items produced must be identical to an uninterrupted run.
Clarifications to address in your design
-
How do you handle the end of an epoch?
-
If
shuffle=True
, how do you ensure the shuffle order is reproducible across resume?
-
What happens when the last batch is smaller than
B
?
Constraints
-
Assume
N
can be large; avoid storing unnecessary full copies of the dataset.
-
State must be reasonably small and serializable (e.g., JSON/pickle equivalent).