Design query pagination for large datasets
Company: Coinbase
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Onsite
## Scenario
You are building an API that returns a **paginated list** of records from a very large dataset (millions+ rows). Clients can specify:
- Filters (e.g., status, date range)
- Sort order (e.g., `created_at desc`, possibly tie-breaking by `id`)
- Page size (limit)
Example endpoint:
- `GET /items?status=OPEN&sort=created_at_desc&limit=50&cursor=...`
## Requirements
1. Support **fast pagination** for deep pages (e.g., user keeps scrolling).
2. Results should be **stable**: users should not see duplicates or missing items while paginating, even if new records are inserted concurrently.
3. Support common relational databases (e.g., Postgres/MySQL) and typical indexes.
4. Handle common edge cases:
- Many records share the same `created_at`
- Data can be inserted/updated/deleted while the user paginates
- Clients may change filters/sort between requests
## Tasks
1. Propose a pagination strategy (e.g., offset-based, cursor/keyset-based, or hybrid) and explain the trade-offs.
2. Define the **cursor format** and what fields it must contain.
3. Provide example SQL queries for the first page and subsequent pages.
4. Discuss how you would ensure stability/consistency, and what guarantees you can realistically provide (e.g., “as of” snapshot vs best-effort).
5. Describe the indexing strategy and how it changes with different sort orders.
Quick Answer: This question evaluates a candidate's understanding of pagination strategies, cursor/keyset design, SQL query formulation, consistency guarantees under concurrent writes, and indexing for very large datasets.