Design a Resumable Iterator with Checkpoint and Restore
Company: OpenAI
Role: Machine Learning Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's ability to design serializable, stateful abstractions in both in-memory and I/O-bound contexts. It tests understanding of iterator patterns, checkpoint-restore semantics, and test-driven development — competencies commonly assessed in engineering roles requiring systems thinking and API design rigor.
Constraints
- Elements are returned in their original order with no skipping and no duplication across a save/resume boundary.
- The checkpoint is an opaque, serializable value (here a non-negative integer offset) — never a live reference to the iterator's internal state.
- save_state() is read-only: it must not advance the iterator or change what the original yields next.
- k is clamped into [0, n]; saving before any next() resumes from the first element, saving after the last resumes into an empty iterator.
- The same contract must extend to a streaming file iterator whose checkpoint (a byte offset at a line boundary) is independent of how many lines were already consumed.
Examples
Input: ([10, 20, 30, 40, 50], 2)
Expected Output: [[10, 20], [30, 40, 50], 2, True]
Explanation: Consume the first 2 elements, checkpoint at index 2, then resume: the fresh iterator yields exactly [30, 40, 50]. has_next is True right after the checkpoint.
Input: ([10, 20, 30, 40, 50], 0)
Expected Output: [[], [10, 20, 30, 40, 50], 0, True]
Explanation: Saving before any next() yields checkpoint 0; resume reproduces the entire sequence from the first element.
Input: ([10, 20, 30, 40, 50], 5)
Expected Output: [[10, 20, 30, 40, 50], [], 5, False]
Explanation: Saving after the last element yields checkpoint n=5; resume produces an empty iterator and has_next is False.
Input: ([], 0)
Expected Output: [[], [], 0, False]
Explanation: Empty source: nothing consumed, checkpoint 0, empty resume, has_next False.
Input: (['a'], 1)
Expected Output: [['a'], [], 1, False]
Explanation: Single-element boundary case: consume the only element, checkpoint at 1, resume is empty.
Input: ([7, 7, 7, 7], 1)
Expected Output: [[7], [7, 7, 7], 1, True]
Explanation: Duplicate values must not be skipped or duplicated across the save/resume boundary — exactly the unconsumed tail returns.
Input: ([-1, -2, -3], 2)
Expected Output: [[-1, -2], [-3], 2, True]
Explanation: Negative payloads round-trip identically; the checkpoint is a position, independent of element values.
Input: ([10, 20, 30, 40, 50], 8)
Expected Output: [[10, 20, 30, 40, 50], [], 5, False]
Explanation: k is clamped to n=5, so all elements are consumed, the checkpoint is 5, and resume is empty.
Hints
- Represent the iterator's position as a single integer index into the source. has_next() compares it to the length; next() returns source[pos] then increments pos.
- save_state() returns that integer by value — a copy, not a reference — so the checkpoint is portable and immutable, and reading it cannot advance iteration.
- Rebuilding from (source, state) just sets the starting index to the saved value, so resume yields exactly elements k..n-1. For the file follow-up, swap the integer index for a byte offset captured with tell() at a line boundary so reopening in a new process streams from there without re-reading prior lines.