How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Onsite rounds at Microsoft.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Microsoft during technical interviews.

Implement a resumable data loader | Microsoft Coding Question

Quick Overview

This question evaluates a candidate's ability to design stateful, resumable data iteration with deterministic shuffling and compact serializable checkpointing, exercising skills in state management, reproducibility, and handling large indexable datasets.

Implement a resumable data loader

Company: Microsoft

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

## Problem: Resumable DataLoader You are implementing a mini data-loading component for model training. Design a `ResumableDataLoader` that iterates over a dataset and yields mini-batches, but can also **save its state** and later **resume** from exactly where it left off. ### Requirements - The dataset is an indexable collection `dataset[0..N-1]`. - The loader yields batches of size `B` as lists of dataset items (or indices). - Supports: - `shuffle=True/False`. - Deterministic behavior given a `seed`. - Provide APIs (language-agnostic): - `__iter__()` / `next()` (or equivalent) to iterate batches. - `state_dict()` → returns a serializable object capturing everything needed to resume. - `load_state_dict(state)` → restores the loader to continue iteration. ### Resume correctness After saving state mid-epoch and restoring, the sequence of items produced must be identical to an uninterrupted run. ### Clarifications to address in your design - How do you handle the end of an epoch? - If `shuffle=True`, how do you ensure the shuffle order is reproducible across resume? - What happens when the last batch is smaller than `B`? ### Constraints - Assume `N` can be large; avoid storing unnecessary full copies of the dataset. - State must be reasonably small and serializable (e.g., JSON/pickle equivalent).

Quick Answer: This question evaluates a candidate's ability to design stateful, resumable data iteration with deterministic shuffling and compact serializable checkpointing, exercising skills in state management, reproducibility, and handling large indexable datasets.

Return the batches before a saved state, the batches after resuming from that state, and the uninterrupted batch sequence for comparison.

Constraints

The deterministic harness uses saveAfter as the number of batches consumed before saving.
When shuffle is true, random.Random(seed) defines the order.
The returned state behavior must match the uninterrupted run.

Examples

Input: (["a","b","c","d","e"], 2, False, 7, 1)

Expected Output: {'before_save': [['a', 'b']], 'after_resume': [['c', 'd'], ['e']], 'uninterrupted': [['a', 'b'], ['c', 'd'], ['e']]}

Explanation: Without shuffle, resume continues at the next sequential batch.

Input: ([0,1,2,3,4,5], 3, True, 42, 1)

Expected Output: {'before_save': [[3, 1, 2]], 'after_resume': [[4, 0, 5]], 'uninterrupted': [[3, 1, 2], [4, 0, 5]]}

Explanation: Shuffle order is deterministic for the seed and preserved in state.

Input: ([1,2], 5, False, 1, 3)

Expected Output: {'before_save': [[1, 2]], 'after_resume': [], 'uninterrupted': [[1, 2]]}

Explanation: A short final batch and over-large save point are handled.

Hints

Persist the shuffled order and the next batch index.
Resume by slicing the same saved order.

Quick Overview