PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's ability to design serializable, stateful abstractions in both in-memory and I/O-bound contexts. It tests understanding of iterator patterns, checkpoint-restore semantics, and test-driven development — competencies commonly assessed in engineering roles requiring systems thinking and API design rigor.

  • medium
  • OpenAI
  • Coding & Algorithms
  • Machine Learning Engineer

Design a Resumable Iterator with Checkpoint and Restore

Company: OpenAI

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Design a **resumable iterator**: an iterator that can serialize its current position into a compact, opaque checkpoint and later be reconstructed from that checkpoint so that iteration continues from exactly where it left off — even in a brand-new process. You will build this in two stages, and you are expected to **write the tests first**, then implement against them. Your test suite must be detailed enough that a correct implementation passes immediately on the first run. ## Stage 1 — List iterator Implement `ResumableListIterator` over an in-memory sequence. It must support: - `has_next() -> bool` — whether another element remains. - `next() -> T` — return the next element and advance; raise `StopIteration` (or your language's equivalent) if none remain. - `save_state() -> State` — return a checkpoint capturing the current position. The checkpoint must be a value that can be stored and passed around (e.g. an integer, a string, or a small serializable object). It must **not** be a live reference to the iterator's internal mutable state. - A constructor / factory that builds an iterator from `(source, state)` so that resuming from a saved checkpoint yields exactly the elements that had not yet been returned at the moment `save_state()` was called. Resume semantics, precisely: - If you call `save_state()` after returning the first `k` of `n` elements, then build a new iterator from the same source plus that checkpoint, the new iterator must return elements `k, k+1, ..., n-1` (0-indexed) in order and then report no more. - Saving at the very start (before any `next()`) resumes from the first element. Saving after the last element resumes into an empty iterator (`has_next()` is `false`). - `save_state()` is a read-only operation: calling it must not advance the iterator or change what the original iterator yields next. ## Stage 2 — File iterator (follow-up) Implement `ResumableFileIterator` that yields the **lines** of a text file one at a time, with the same `has_next` / `next` / `save_state` / resume-from-state contract as Stage 1. Additional requirements specific to files: - The checkpoint must let a fresh iterator (in a new process, with the file reopened) resume at the next unread line without re-reading or re-emitting any line already returned. - Do **not** require loading the whole file into memory; the iterator must stream so that it works on a file far larger than available RAM. Your checkpoint should therefore be small and **must not** embed the file's contents. - State the assumptions your checkpoint relies on (for example, that the underlying file is not modified between save and resume), and design the checkpoint so that resuming lands on a well-defined line boundary rather than mid-line. ### Required output / deliverables 1. A **test suite** covering at minimum: resume from the start, from the middle, from the end (empty resume), and a round trip where you iterate part-way, checkpoint, reconstruct, and assert the remaining sequence equals the expected tail. Include a test asserting `save_state()` does not mutate the original iterator. 2. The **`ResumableListIterator`** implementation passing those tests. 3. The **`ResumableFileIterator`** implementation passing line-based versions of the same tests, plus a test that resume continues at the correct line boundary on a multi-line file. ### Constraints & Assumptions - Elements/lines are returned in their original order; no skipping, no duplication across a save/resume boundary. - A checkpoint produced by `save_state()` is **opaque** to callers and **portable**: it can be serialized (e.g. to JSON or a primitive) and used to reconstruct the iterator later, possibly in a different process. - For the file iterator, assume the file is encoded in a single, known text encoding (e.g. UTF-8) and is not concurrently modified between checkpoint and resume. - Aim for $O(1)$ additional memory per `next()` for the file iterator (streaming), and a checkpoint whose size is independent of how many elements have already been consumed.

Quick Answer: This question evaluates a candidate's ability to design serializable, stateful abstractions in both in-memory and I/O-bound contexts. It tests understanding of iterator patterns, checkpoint-restore semantics, and test-driven development — competencies commonly assessed in engineering roles requiring systems thinking and API design rigor.

Design a **resumable iterator** that can serialize its current position into a compact, opaque checkpoint and later be reconstructed from that checkpoint so iteration continues from exactly where it left off — even in a brand-new process. Implement `ResumableListIterator` over an in-memory sequence supporting: - `has_next() -> bool` — whether another element remains. - `next() -> T` — return the next element and advance; raise `StopIteration` (or your language's equivalent) if none remain. - `save_state() -> State` — return a checkpoint capturing the current position. The checkpoint must be a plain value (e.g. an integer) that can be stored, serialized, and passed around. It must **not** be a live reference to the iterator's internal mutable state, and calling it must **not** advance the iterator. - A constructor / factory `(source, state)` that rebuilds an iterator so that resuming from a saved checkpoint yields exactly the elements that had not yet been returned when `save_state()` was called. **Resume semantics, precisely.** If you call `save_state()` after returning the first `k` of `n` elements, then build a new iterator from the same source plus that checkpoint, the new iterator must return elements `k, k+1, ..., n-1` (0-indexed) in order and then report no more. Saving at the very start (before any `next()`) resumes from the first element; saving after the last element resumes into an empty iterator. **Harness wrapper.** Implement `solution(source, k)` that demonstrates one full round trip: build a fresh iterator over `source`, consume the first `k` elements (clamping `k` into `[0, n]`), call `save_state()` to obtain the checkpoint, verify `save_state()` left the iterator unchanged, then reconstruct a new iterator from `(source, state)` and drain it. Return `[consumed, tail, state, has_next_after_save]` where `consumed` is the first `k` elements, `tail` is the remaining elements produced after resuming, `state` is the opaque checkpoint, and `has_next_after_save` is whether the original iterator still had elements right after the checkpoint. **Follow-up (discussion, not graded here).** Extend the same `has_next` / `next` / `save_state` / resume contract to a streaming `ResumableFileIterator` over the lines of a text file. The checkpoint becomes a byte offset at a line boundary so a fresh iterator (file reopened in a new process) resumes at the next unread line without re-emitting any returned line, never loads the whole file into memory, and keeps the checkpoint size independent of how many lines were consumed — assuming the file is a fixed UTF-8 encoding and is not modified between save and resume.

Constraints

  • Elements are returned in their original order with no skipping and no duplication across a save/resume boundary.
  • The checkpoint is an opaque, serializable value (here a non-negative integer offset) — never a live reference to the iterator's internal state.
  • save_state() is read-only: it must not advance the iterator or change what the original yields next.
  • k is clamped into [0, n]; saving before any next() resumes from the first element, saving after the last resumes into an empty iterator.
  • The same contract must extend to a streaming file iterator whose checkpoint (a byte offset at a line boundary) is independent of how many lines were already consumed.

Examples

Input: ([10, 20, 30, 40, 50], 2)

Expected Output: [[10, 20], [30, 40, 50], 2, True]

Explanation: Consume the first 2 elements, checkpoint at index 2, then resume: the fresh iterator yields exactly [30, 40, 50]. has_next is True right after the checkpoint.

Input: ([10, 20, 30, 40, 50], 0)

Expected Output: [[], [10, 20, 30, 40, 50], 0, True]

Explanation: Saving before any next() yields checkpoint 0; resume reproduces the entire sequence from the first element.

Input: ([10, 20, 30, 40, 50], 5)

Expected Output: [[10, 20, 30, 40, 50], [], 5, False]

Explanation: Saving after the last element yields checkpoint n=5; resume produces an empty iterator and has_next is False.

Input: ([], 0)

Expected Output: [[], [], 0, False]

Explanation: Empty source: nothing consumed, checkpoint 0, empty resume, has_next False.

Input: (['a'], 1)

Expected Output: [['a'], [], 1, False]

Explanation: Single-element boundary case: consume the only element, checkpoint at 1, resume is empty.

Input: ([7, 7, 7, 7], 1)

Expected Output: [[7], [7, 7, 7], 1, True]

Explanation: Duplicate values must not be skipped or duplicated across the save/resume boundary — exactly the unconsumed tail returns.

Input: ([-1, -2, -3], 2)

Expected Output: [[-1, -2], [-3], 2, True]

Explanation: Negative payloads round-trip identically; the checkpoint is a position, independent of element values.

Input: ([10, 20, 30, 40, 50], 8)

Expected Output: [[10, 20, 30, 40, 50], [], 5, False]

Explanation: k is clamped to n=5, so all elements are consumed, the checkpoint is 5, and resume is empty.

Hints

  1. Represent the iterator's position as a single integer index into the source. has_next() compares it to the length; next() returns source[pos] then increments pos.
  2. save_state() returns that integer by value — a copy, not a reference — so the checkpoint is portable and immutable, and reading it cannot advance iteration.
  3. Rebuilding from (source, state) just sets the starting index to the saved value, so resume yields exactly elements k..n-1. For the file follow-up, swap the integer index for a byte offset captured with tell() at a line boundary so reopening in a new process streams from there without re-reading prior lines.
Last updated: Jun 24, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Infection Spread Simulation with Death Threshold - OpenAI (medium)
  • Spreading Contagion on a Grid - OpenAI (medium)
  • Streaming Entropy with Numerical Stability - OpenAI (hard)
  • Implement a Distributed Rate Limiter - OpenAI (medium)
  • Compute Plant Infection Time - OpenAI (medium)