PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's ability to design stateful, resumable data iteration with deterministic shuffling and compact serializable checkpointing, exercising skills in state management, reproducibility, and handling large indexable datasets.

  • medium
  • Microsoft
  • Coding & Algorithms
  • Machine Learning Engineer

Implement a resumable data loader

Company: Microsoft

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

## Problem: Resumable DataLoader You are implementing a mini data-loading component for model training. Design a `ResumableDataLoader` that iterates over a dataset and yields mini-batches, but can also **save its state** and later **resume** from exactly where it left off. ### Requirements - The dataset is an indexable collection `dataset[0..N-1]`. - The loader yields batches of size `B` as lists of dataset items (or indices). - Supports: - `shuffle=True/False`. - Deterministic behavior given a `seed`. - Provide APIs (language-agnostic): - `__iter__()` / `next()` (or equivalent) to iterate batches. - `state_dict()` → returns a serializable object capturing everything needed to resume. - `load_state_dict(state)` → restores the loader to continue iteration. ### Resume correctness After saving state mid-epoch and restoring, the sequence of items produced must be identical to an uninterrupted run. ### Clarifications to address in your design - How do you handle the end of an epoch? - If `shuffle=True`, how do you ensure the shuffle order is reproducible across resume? - What happens when the last batch is smaller than `B`? ### Constraints - Assume `N` can be large; avoid storing unnecessary full copies of the dataset. - State must be reasonably small and serializable (e.g., JSON/pickle equivalent).

Quick Answer: This question evaluates a candidate's ability to design stateful, resumable data iteration with deterministic shuffling and compact serializable checkpointing, exercising skills in state management, reproducibility, and handling large indexable datasets.

Return the batches before a saved state, the batches after resuming from that state, and the uninterrupted batch sequence for comparison.

Constraints

  • The deterministic harness uses saveAfter as the number of batches consumed before saving.
  • When shuffle is true, random.Random(seed) defines the order.
  • The returned state behavior must match the uninterrupted run.

Examples

Input: (["a","b","c","d","e"], 2, False, 7, 1)

Expected Output: {'before_save': [['a', 'b']], 'after_resume': [['c', 'd'], ['e']], 'uninterrupted': [['a', 'b'], ['c', 'd'], ['e']]}

Explanation: Without shuffle, resume continues at the next sequential batch.

Input: ([0,1,2,3,4,5], 3, True, 42, 1)

Expected Output: {'before_save': [[3, 1, 2]], 'after_resume': [[4, 0, 5]], 'uninterrupted': [[3, 1, 2], [4, 0, 5]]}

Explanation: Shuffle order is deterministic for the seed and preserved in state.

Input: ([1,2], 5, False, 1, 3)

Expected Output: {'before_save': [[1, 2]], 'after_resume': [], 'uninterrupted': [[1, 2]]}

Explanation: A short final batch and over-large save point are handled.

Hints

  1. Persist the shuffled order and the next batch index.
  2. Resume by slicing the same saved order.
Last updated: Jun 27, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Return Top K Open Businesses - Microsoft (hard)
  • Implement Memory Allocation and In-Memory Records - Microsoft (medium)
  • Sort Three Categories In Place - Microsoft (medium)
  • Implement K-Means and Detect Divisible Subarrays - Microsoft (medium)
  • Retain Top K Elements - Microsoft (medium)