Scenario
You are building an API that returns a paginated list of records from a very large dataset (millions+ rows). Clients can specify:
-
Filters (e.g., status, date range)
-
Sort order (e.g.,
created_at desc
, possibly tie-breaking by
id
)
-
Page size (limit)
Example endpoint:
-
GET /items?status=OPEN&sort=created_at_desc&limit=50&cursor=...
Requirements
-
Support
fast pagination
for deep pages (e.g., user keeps scrolling).
-
Results should be
stable
: users should not see duplicates or missing items while paginating, even if new records are inserted concurrently.
-
Support common relational databases (e.g., Postgres/MySQL) and typical indexes.
-
Handle common edge cases:
-
Many records share the same
created_at
-
Data can be inserted/updated/deleted while the user paginates
-
Clients may change filters/sort between requests
Tasks
-
Propose a pagination strategy (e.g., offset-based, cursor/keyset-based, or hybrid) and explain the trade-offs.
-
Define the
cursor format
and what fields it must contain.
-
Provide example SQL queries for the first page and subsequent pages.
-
Discuss how you would ensure stability/consistency, and what guarantees you can realistically provide (e.g., “as of” snapshot vs best-effort).
-
Describe the indexing strategy and how it changes with different sort orders.