PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Software Engineering Fundamentals/Coinbase

Debug and Extend Cursor Queries

Last updated: Jun 27, 2026

Quick Overview

This question evaluates understanding of iterator/cursor semantics, stateful side effects and mutation, practical debugging and trace reasoning, and basic query API design such as adding limit or order_by features.

  • hard
  • Coinbase
  • Software Engineering Fundamentals
  • Software Engineer

Debug and Extend Cursor Queries

Company: Coinbase

Role: Software Engineer

Category: Software Engineering Fundamentals

Difficulty: hard

Interview Round: Onsite

You are in an **AI-assisted coding interview**. You may consult AI-generated suggestions, but you are expected to validate them, explain your reasoning out loud, and share your screen while you work. The interviewer is judging *how* you use AI — whether you read, test, and challenge its output — as much as the final code. You are given a small in-memory database abstraction. A **table** is a list of rows, where each row is a dictionary from column name to value. A **query** returns a **cursor** object that supports incremental, forward-only iteration over the matching rows via `has_next()` / `next()`. A simplified implementation is shown below: ```python class Cursor: def __init__(self, rows): self.rows = rows self.index = 0 def has_next(self): self.index += 1 return self.index < len(self.rows) def next(self): if not self.has_next(): return None return self.rows[self.index] class Table: def __init__(self): self.rows = [] def insert(self, row): self.rows.append(row) def query(self, predicate=lambda row: True): return Cursor([row for row in self.rows if predicate(row)]) ``` The intended idiom for consuming a cursor is the standard iterate-until-exhausted loop: ```python cursor = table.query(lambda r: r["active"]) while cursor.has_next(): row = cursor.next() process(row) ``` You will work through four parts: debug the cursor, design a new feature, review a proposed pull request, and debug a production incident. Treat any AI suggestion you reach for as a draft to be verified, not an answer. ### Constraints & Assumptions - Single-process, in-memory store; rows are plain Python `dict`s. Assume CPython unless you state otherwise. No real SQL engine, no network. - A cursor is **forward-only** (no rewind) and **single-pass** unless you justify changing the contract. - Tables can range from a handful of rows to roughly $10^5$–$10^6$; your answers should note where an approach stops scaling (when "copy everything per query" becomes a real cost). - "Concurrency" in Parts 2–4 means other threads or async tasks calling `insert` / `delete` / `update` on the same `Table` while a cursor is being consumed. State the concurrency model you assume (single-threaded, GIL-protected threads, snapshot isolation, etc.) — that choice is part of the answer. ### Clarifying Questions to Ask - What is the expected end-of-stream behavior — return `None`, raise `StopIteration`, or support both? - Is a cursor meant to observe a point-in-time **snapshot**, or a **live** view that reflects writes made after the query started? - Can one cursor be consumed by multiple threads, or is it owned by a single consumer? - For pagination, is page size fixed by the server or chosen by the client, and is exactly-once delivery across pages a requirement? - Are callers allowed to mutate row dictionaries in place, or should the store guarantee they can't alter persisted state? - What ordering, if any, does `query()` guarantee today — insertion order or none? ### Part 1 — Bug finding & cursor semantics Identify the cursor-iteration bug(s) in the implementation above. State the **cursor contract** you'd expect, explain concretely what the current code does wrong, and fix it. **Demonstrate the failure with a trace** — show what the canonical `while has_next(): next()` loop actually yields for a small table — and show your fix returns every matching row exactly once, in order. ```hint Trace it by hand Insert rows with ids `0..4`, then walk the canonical loop one call at a time, writing down `self.index` after every `has_next()` and every `next()` call. Note what `next()` does *internally*. ``` ```hint The core defect Ask whether a "do I have more?" predicate *should* have a side effect. If it does, what happens when the canonical loop calls it once and then `next()` calls it again internally? ``` ```hint The fix shape `has_next()` should be side-effect-free; make `next()` the **sole** place that advances the index. Once you've done that, check your fix against an empty result, a single-row result, and repeated `has_next()` calls before a `next()`. ``` #### What This Part Should Cover - A precise cursor contract: `has_next()` is pure / idempotent (repeated calls don't advance), `next()` advances exactly once, and a single, documented end-of-stream behavior. - A **concrete trace** of the original code that shows *why* it skips rows and double-advances (the index moves two positions per consumed row) — not just an assertion that it is buggy. - A clean fix verified against the edge cases: empty result, single row, past-the-end, and repeated `has_next()` before a `next()`. ### Part 2 — New feature design Add support for **one** of the following and walk through the design — API surface, data structure / algorithm, time / space cost, where it stops scaling, and which class owns the logic (cursor vs. table vs. query): - `limit(n)` — stop after at most `n` rows. - `order_by(column)` — return rows sorted by a column. - Pagination with a **stable resume token** — fetch a page, hand back a token, resume the next page from that token. ```hint limit(n) — the lightweight choice Track a returned-count on the cursor and have `has_next()` report `False` once the cap is reached. Decide sane behavior for `n == 0` and `n > len(rows)`, and think about chaining (`query(...).limit(n)`). ``` ```hint order_by — cost vs. reuse Sorting the snapshot is $O(n \log n)$ time / $O(n)$ space but re-sorts on every query; the alternative is a maintained secondary index or heap (cheaper reads, costlier writes / memory). Either way you need a deterministic **tie-breaker** for equal keys, or ordering is unstable. ``` ```hint pagination — keyset, not offset OFFSET paging drifts: an insert or delete *before* the current offset shifts every later row, causing skips / dupes. Prefer **keyset (seek) paging** over a deterministic total order such as `(sort_key, id)`, and make the resume token encode the *last-seen key* so the next page requests rows strictly greater than it. Ask what breaks if two rows compare equal under your ordering. ``` #### What This Part Should Cover - Correct behavior for the chosen feature, including its edge cases (e.g. `limit(0)` and `limit > len(rows)`, an `order_by` tie-breaker, a deleted boundary row in pagination). - Honest time / space complexity and the **scaling cliff** — the point at which "re-scan / re-sort / re-copy per query" stops being acceptable, and what replaces it (maintained index, heap, keyset seek). - A clear decision about which class owns the new logic (cursor vs. table vs. query) and how it composes with the existing API. ### Part 3 — Pull request review An AI agent proposes a change that, on **every** `query()` call, eagerly copies all matching rows into the cursor, storing **mutable references** to the original row dictionaries. Review it: what **correctness**, **performance**, and **maintainability** concerns would you raise, and what would you ask the author to change before approving? ```hint Three lenses Correctness / isolation: shared mutable `dict` references let a caller mutate persisted state outside validation / locking — and copying the *list* is not a snapshot if the dicts inside still alias. Performance: full scan + full copy is $O(n)$ time and memory per query, with GC churn at scale. Maintainability: is this a snapshot, a live view, or isolated — and is that documented and covered by tests? ``` #### What This Part Should Cover - The isolation insight: copying the *list* of references is **not** a snapshot when the row dicts still alias the live store, so a caller (or a concurrent write) can mutate persisted state mid-iteration. - The performance cost: $O(n)$ time and memory per query, GC churn at scale, and how eager materialization defeats early-exit consumers (a `limit(1)` still pays to copy everything). - Concrete review outcomes: name the undocumented contract (snapshot vs. live vs. isolated) and the specific changes and tests required before approval. ### Part 4 — Production debugging Users report that some paginated queries **return duplicate rows or skip rows** while writes happen concurrently. Explain how you'd investigate, reproduce, instrument, and fix this — as a structured methodology, not just the final answer. ```hint Form a hypothesis first The two classic culprits are OFFSET paging (an insert / delete before the offset shifts later rows) and a **non-unique sort key** (ties reorder between page fetches). Decide which to rule in / out first, and pick the single log line per page (filter, order key, page size, first / last row key, token, table version) that lets you tell them apart. ``` ```hint Reproduce deterministically, then fix at the root Force the race: one reader paginating while a writer inserts / deletes at controlled points, seeded with *identical* sort-key values to expose a missing tie-breaker. The durable fix is keyset paging over a stable total order plus a unique tie-breaker (or paging through a versioned snapshot taken at query start); add regression tests for concurrent writes, duplicate keys, and repeated `has_next()`. ``` #### What This Part Should Cover - A disciplined methodology: symptom → hypothesis → instrumentation → deterministic repro → root-cause fix → regression test — rather than jumping straight to a patch. - Correctly fingering the root cause (offset drift and / or a non-unique sort key) and the diagnostic signal that distinguishes them (a per-page first / last key plus a table-version log line). - A root-cause fix (keyset paging over a stable total order, or a versioned snapshot) plus the regression tests that prove it (concurrent writes, duplicate keys, deleted boundary row). ### What a Strong Answer Covers These dimensions span all four parts: - A consistent mental model of the cursor's **contract** — what it promises the caller — applied uniformly across debugging, feature design, review, and the production incident. - Clear separation of **snapshot vs. live** semantics, and the recurring insight that copying row *references* is not true isolation. - Responsible AI-tool use: reads suggestions, validates with tests and edge cases, pushes back on a plausible-but-wrong AI answer, and can justify every accepted line out loud while sharing the screen. ### Follow-up Questions - Make the cursor a real Python iterator (`__iter__` / `__next__`). How does that change the `has_next()` design, what does `for row in cursor:` do at end-of-stream, and how do you keep both APIs consistent? - The store becomes multi-threaded. Sketch how you'd give cursors a consistent snapshot *without* copying the whole table per query (versioning / MVCC / copy-on-write), and what it costs reads. - If `order_by` must work on a 10M-row table that doesn't fit in memory, how does the design change (external sort, indexes, streaming merge)? - How would you give the resume token integrity and expiry so a client can't forge or replay a stale page boundary?

Quick Answer: This question evaluates understanding of iterator/cursor semantics, stateful side effects and mutation, practical debugging and trace reasoning, and basic query API design such as adding limit or order_by features.

Related Interview Questions

  • Design a task management system with TTL - Coinbase (medium)
  • Implement a blog feed with fetching and state - Coinbase (medium)
  • Implement a Reusable Dropdown Component - Coinbase (hard)
|Home/Software Engineering Fundamentals/Coinbase

Debug and Extend Cursor Queries

Coinbase logo
Coinbase
Apr 2, 2026, 12:00 AM
hardSoftware EngineerOnsiteSoftware Engineering Fundamentals
8
0

You are in an AI-assisted coding interview. You may consult AI-generated suggestions, but you are expected to validate them, explain your reasoning out loud, and share your screen while you work. The interviewer is judging how you use AI — whether you read, test, and challenge its output — as much as the final code.

You are given a small in-memory database abstraction. A table is a list of rows, where each row is a dictionary from column name to value. A query returns a cursor object that supports incremental, forward-only iteration over the matching rows via has_next() / next().

A simplified implementation is shown below:

class Cursor:
    def __init__(self, rows):
        self.rows = rows
        self.index = 0

    def has_next(self):
        self.index += 1
        return self.index < len(self.rows)

    def next(self):
        if not self.has_next():
            return None
        return self.rows[self.index]


class Table:
    def __init__(self):
        self.rows = []

    def insert(self, row):
        self.rows.append(row)

    def query(self, predicate=lambda row: True):
        return Cursor([row for row in self.rows if predicate(row)])

The intended idiom for consuming a cursor is the standard iterate-until-exhausted loop:

cursor = table.query(lambda r: r["active"])
while cursor.has_next():
    row = cursor.next()
    process(row)

You will work through four parts: debug the cursor, design a new feature, review a proposed pull request, and debug a production incident. Treat any AI suggestion you reach for as a draft to be verified, not an answer.

Constraints & Assumptions

  • Single-process, in-memory store; rows are plain Python dict s. Assume CPython unless you state otherwise. No real SQL engine, no network.
  • A cursor is forward-only (no rewind) and single-pass unless you justify changing the contract.
  • Tables can range from a handful of rows to roughly 10510^5105 – 10610^6106 ; your answers should note where an approach stops scaling (when "copy everything per query" becomes a real cost).
  • "Concurrency" in Parts 2–4 means other threads or async tasks calling insert / delete / update on the same Table while a cursor is being consumed. State the concurrency model you assume (single-threaded, GIL-protected threads, snapshot isolation, etc.) — that choice is part of the answer.

Clarifying Questions to Ask

  • What is the expected end-of-stream behavior — return None , raise StopIteration , or support both?
  • Is a cursor meant to observe a point-in-time snapshot , or a live view that reflects writes made after the query started?
  • Can one cursor be consumed by multiple threads, or is it owned by a single consumer?
  • For pagination, is page size fixed by the server or chosen by the client, and is exactly-once delivery across pages a requirement?
  • Are callers allowed to mutate row dictionaries in place, or should the store guarantee they can't alter persisted state?
  • What ordering, if any, does query() guarantee today — insertion order or none?

Part 1 — Bug finding & cursor semantics

Identify the cursor-iteration bug(s) in the implementation above. State the cursor contract you'd expect, explain concretely what the current code does wrong, and fix it. Demonstrate the failure with a trace — show what the canonical while has_next(): next() loop actually yields for a small table — and show your fix returns every matching row exactly once, in order.

What This Part Should Cover

  • A precise cursor contract: has_next() is pure / idempotent (repeated calls don't advance), next() advances exactly once, and a single, documented end-of-stream behavior.
  • A concrete trace of the original code that shows why it skips rows and double-advances (the index moves two positions per consumed row) — not just an assertion that it is buggy.
  • A clean fix verified against the edge cases: empty result, single row, past-the-end, and repeated has_next() before a next() .

Part 2 — New feature design

Add support for one of the following and walk through the design — API surface, data structure / algorithm, time / space cost, where it stops scaling, and which class owns the logic (cursor vs. table vs. query):

  • limit(n) — stop after at most n rows.
  • order_by(column) — return rows sorted by a column.
  • Pagination with a stable resume token — fetch a page, hand back a token, resume the next page from that token.

What This Part Should Cover

  • Correct behavior for the chosen feature, including its edge cases (e.g. limit(0) and limit > len(rows) , an order_by tie-breaker, a deleted boundary row in pagination).
  • Honest time / space complexity and the scaling cliff — the point at which "re-scan / re-sort / re-copy per query" stops being acceptable, and what replaces it (maintained index, heap, keyset seek).
  • A clear decision about which class owns the new logic (cursor vs. table vs. query) and how it composes with the existing API.

Part 3 — Pull request review

An AI agent proposes a change that, on every query() call, eagerly copies all matching rows into the cursor, storing mutable references to the original row dictionaries. Review it: what correctness, performance, and maintainability concerns would you raise, and what would you ask the author to change before approving?

What This Part Should Cover

  • The isolation insight: copying the list of references is not a snapshot when the row dicts still alias the live store, so a caller (or a concurrent write) can mutate persisted state mid-iteration.
  • The performance cost: O(n)O(n)O(n) time and memory per query, GC churn at scale, and how eager materialization defeats early-exit consumers (a limit(1) still pays to copy everything).
  • Concrete review outcomes: name the undocumented contract (snapshot vs. live vs. isolated) and the specific changes and tests required before approval.

Part 4 — Production debugging

Users report that some paginated queries return duplicate rows or skip rows while writes happen concurrently. Explain how you'd investigate, reproduce, instrument, and fix this — as a structured methodology, not just the final answer.

What This Part Should Cover

  • A disciplined methodology: symptom → hypothesis → instrumentation → deterministic repro → root-cause fix → regression test — rather than jumping straight to a patch.
  • Correctly fingering the root cause (offset drift and / or a non-unique sort key) and the diagnostic signal that distinguishes them (a per-page first / last key plus a table-version log line).
  • A root-cause fix (keyset paging over a stable total order, or a versioned snapshot) plus the regression tests that prove it (concurrent writes, duplicate keys, deleted boundary row).

What a Strong Answer Covers

These dimensions span all four parts:

  • A consistent mental model of the cursor's contract — what it promises the caller — applied uniformly across debugging, feature design, review, and the production incident.
  • Clear separation of snapshot vs. live semantics, and the recurring insight that copying row references is not true isolation.
  • Responsible AI-tool use: reads suggestions, validates with tests and edge cases, pushes back on a plausible-but-wrong AI answer, and can justify every accepted line out loud while sharing the screen.

Follow-up Questions

  • Make the cursor a real Python iterator ( __iter__ / __next__ ). How does that change the has_next() design, what does for row in cursor: do at end-of-stream, and how do you keep both APIs consistent?
  • The store becomes multi-threaded. Sketch how you'd give cursors a consistent snapshot without copying the whole table per query (versioning / MVCC / copy-on-write), and what it costs reads.
  • If order_by must work on a 10M-row table that doesn't fit in memory, how does the design change (external sort, indexes, streaming merge)?
  • How would you give the resume token integrity and expiry so a client can't forge or replay a stale page boundary?
Loading comments...

Browse More Questions

More Software Engineering Fundamentals•More Coinbase•More Software Engineer•Coinbase Software Engineer•Coinbase Software Engineering Fundamentals•Software Engineer Software Engineering Fundamentals

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.