PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Coding & Algorithms/Microsoft

Implement a resumable data loader

Last updated: May 4, 2026

Quick Overview

This question evaluates a candidate's ability to design stateful, resumable data iteration with deterministic shuffling and compact serializable checkpointing, exercising skills in state management, reproducibility, and handling large indexable datasets.

  • medium
  • Microsoft
  • Coding & Algorithms
  • Machine Learning Engineer

Implement a resumable data loader

Company: Microsoft

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

## Problem: Resumable DataLoader You are implementing a mini data-loading component for model training. Design a `ResumableDataLoader` that iterates over a dataset and yields mini-batches, but can also **save its state** and later **resume** from exactly where it left off. ### Requirements - The dataset is an indexable collection `dataset[0..N-1]`. - The loader yields batches of size `B` as lists of dataset items (or indices). - Supports: - `shuffle=True/False`. - Deterministic behavior given a `seed`. - Provide APIs (language-agnostic): - `__iter__()` / `next()` (or equivalent) to iterate batches. - `state_dict()` → returns a serializable object capturing everything needed to resume. - `load_state_dict(state)` → restores the loader to continue iteration. ### Resume correctness After saving state mid-epoch and restoring, the sequence of items produced must be identical to an uninterrupted run. ### Clarifications to address in your design - How do you handle the end of an epoch? - If `shuffle=True`, how do you ensure the shuffle order is reproducible across resume? - What happens when the last batch is smaller than `B`? ### Constraints - Assume `N` can be large; avoid storing unnecessary full copies of the dataset. - State must be reasonably small and serializable (e.g., JSON/pickle equivalent).

Quick Answer: This question evaluates a candidate's ability to design stateful, resumable data iteration with deterministic shuffling and compact serializable checkpointing, exercising skills in state management, reproducibility, and handling large indexable datasets.

Related Interview Questions

  • Sort Three Categories In Place - Microsoft (medium)
  • Implement K-Means and Detect Divisible Subarrays - Microsoft (medium)
  • Implement SFT Sample Packing - Microsoft (medium)
  • Implement SQL Table and DNA Ordering - Microsoft (medium)
  • Solve power jumps and graph tour - Microsoft (hard)
Microsoft logo
Microsoft
Feb 11, 2026, 12:00 AM
Machine Learning Engineer
Onsite
Coding & Algorithms
3
0
Loading...

Problem: Resumable DataLoader

You are implementing a mini data-loading component for model training.

Design a ResumableDataLoader that iterates over a dataset and yields mini-batches, but can also save its state and later resume from exactly where it left off.

Requirements

  • The dataset is an indexable collection dataset[0..N-1] .
  • The loader yields batches of size B as lists of dataset items (or indices).
  • Supports:
    • shuffle=True/False .
    • Deterministic behavior given a seed .
  • Provide APIs (language-agnostic):
    • __iter__() / next() (or equivalent) to iterate batches.
    • state_dict() → returns a serializable object capturing everything needed to resume.
    • load_state_dict(state) → restores the loader to continue iteration.

Resume correctness

After saving state mid-epoch and restoring, the sequence of items produced must be identical to an uninterrupted run.

Clarifications to address in your design

  • How do you handle the end of an epoch?
  • If shuffle=True , how do you ensure the shuffle order is reproducible across resume?
  • What happens when the last batch is smaller than B ?

Constraints

  • Assume N can be large; avoid storing unnecessary full copies of the dataset.
  • State must be reasonably small and serializable (e.g., JSON/pickle equivalent).

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Microsoft•More Machine Learning Engineer•Microsoft Machine Learning Engineer•Microsoft Coding & Algorithms•Machine Learning Engineer Coding & Algorithms
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.