How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a medium difficulty System Design question, commonly asked during Technical Screen rounds at Microsoft.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Microsoft during technical interviews.

Design a To-Do List Service (CRUD, Auth, Rate Limiting, Caching & API Versioning)

Q: Design a To-Do List Service (CRUD, Auth, Rate Limiting, Caching & API Versioning)

This system design question evaluates practical full-stack architectural thinking across REST API design, authentication, rate limiting, and caching in a multi-tenant SaaS context. It assesses whether an engineer can reason about operational concerns — data isolation, burst traffic, API versioning, and cache consistency — beyond basic CRUD implementation.

Design a to-do list service. Users sign in, create multiple named lists, and within each list create, view, update, and delete tasks. The service is accessed by a web frontend and a mobile app, and you are expected to design the REST API and the backend that powers it.

Start from the core CRUD product, then take the design through the operational concerns a full-stack engineer is expected to reason about: authentication and authorization, handling a single user that bursts to roughly 500 requests per second, ordering and listing tasks, caching (and the staleness problems caching introduces), REST API versioning, and how internal services authenticate to one another.

Constraints & Assumptions

Multi-tenant SaaS: each user owns their lists and tasks; a user must never see or mutate another user's data.
A list contains tasks. A task has at minimum: an id, the parent list id, a title, a completion flag, a due date (optional), and timestamps.
Read-heavy workload (viewing tasks dominates), but writes (add/update/complete/delete) are frequent.
Target API latency: p99 under ~150 ms for reads.
One user can briefly burst to ~500 requests/second (e.g., a misbehaving client or an aggressive sync loop); the system must stay healthy for everyone else.
The backend is composed of more than one internal service (e.g., an API gateway / edge service plus a task service), so service-to-service calls happen on the request path.
Assume a managed relational database is available, plus a managed cache (e.g., Redis).

Clarifying Questions to Ask

What is the scale — daily active users, total users, average lists/tasks per user — and what is the read:write ratio?
Do we need real-time sync across a user's devices (push/websockets), or is pull-on-refresh acceptable?
Are there sharing/collaboration features (multiple users on one list), or is every list owned by exactly one user?
What consistency does the product need — is it acceptable for a just-completed task to briefly appear incomplete on another device (eventual consistency), or must reads be strongly consistent?
Is the 500 req/s burst a legitimate use case we must serve, or abuse we should throttle? Per-user, per-IP, or per-token?
Are internal services inside one trusted network/VPC, or can they be reached from outside, and is there a service mesh available?

Part 1

Design the core to-do service: the data model, the REST API for lists and tasks (CRUD), and the read/write paths. Cover how a request flows from client to database and back.

What This Part Should Cover Premium

Part 2

Explain how you authenticate users and authorize requests. Define the difference between authentication and authorization, and show where each happens in your request flow. Then explain how internal services authenticate to each other when one service calls another on the request path.

Clarifying Questions for this Part

Are we using a third-party identity provider (OAuth2/OIDC) or rolling our own session/token system?
Should authorization be coarse (own-your-data) only, or do we need roles/sharing (owner vs. collaborator vs. viewer)?

What This Part Should Cover Premium

Part 3

A single user starts sending ~500 requests/second. Decide whether to absorb this with rate limiting, caching, or both, and design the mechanism. Explain what you protect, where the limiter lives, and what the client experiences.

What This Part Should Cover Premium

Part 4

Design how tasks are listed and ordered (e.g., by due date, by creation time, by manual priority). Then add a cache for the read path. Explain the caching strategy, the staleness problems it introduces, and how you keep cached data correct.

What This Part Should Cover Premium

Part 5

Explain REST API versioning. Why put a version such as v1 in the path (/v1/lists)? What problem does it solve, and how do you evolve the API without breaking existing mobile clients you cannot force to upgrade?

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

In Part 4, suppose a user has the same list open on a phone and a laptop and completes a task on the phone. Walk through exactly what the laptop sees and when, and how you would make the two converge.
In Part 3, your per-user limiter is keyed on the user token, but the burst comes from a NAT'd corporate network where many users share one IP. How does that change your keying, and what breaks if you key on IP instead?
In Part 2, a downstream task service receives a call from the edge service. How does it know the call is genuinely from the edge service and on behalf of a specific authenticated user, without trusting a header the client could forge?
In Part 5, you need to rename a response field and change a status code. Walk through shipping that as v2 while v1 mobile clients keep working, including how you decide when it is safe to remove v1 .

Constraints & Assumptions

Multi-tenant SaaS: each user owns their lists and tasks; a user must never see or mutate another user's data.
A list contains tasks. A task has at minimum: an id, the parent list id, a title, a completion flag, a due date (optional), and timestamps.
Read-heavy workload (viewing tasks dominates), but writes (add/update/complete/delete) are frequent.
Target API latency: p99 under ~150 ms for reads.
One user can briefly burst to ~500 requests/second (e.g., a misbehaving client or an aggressive sync loop); the system must stay healthy for everyone else.
The backend is composed of more than one internal service (e.g., an API gateway / edge service plus a task service), so service-to-service calls happen on the request path.
Assume a managed relational database is available, plus a managed cache (e.g., Redis).

Clarifying Questions to Ask

What is the scale — daily active users, total users, average lists/tasks per user — and what is the read:write ratio?
Do we need real-time sync across a user's devices (push/websockets), or is pull-on-refresh acceptable?
Are there sharing/collaboration features (multiple users on one list), or is every list owned by exactly one user?
What consistency does the product need — is it acceptable for a just-completed task to briefly appear incomplete on another device (eventual consistency), or must reads be strongly consistent?
Is the 500 req/s burst a legitimate use case we must serve, or abuse we should throttle? Per-user, per-IP, or per-token?
Are internal services inside one trusted network/VPC, or can they be reached from outside, and is there a service mesh available?

Part 1

Design the core to-do service: the data model, the REST API for lists and tasks (CRUD), and the read/write paths. Cover how a request flows from client to database and back.

What This Part Should Cover Premium

Part 2

Clarifying Questions for this Part

Are we using a third-party identity provider (OAuth2/OIDC) or rolling our own session/token system?
Should authorization be coarse (own-your-data) only, or do we need roles/sharing (owner vs. collaborator vs. viewer)?

What This Part Should Cover Premium

Part 3

What This Part Should Cover Premium

Part 4

What This Part Should Cover Premium

Part 5

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

In Part 4, suppose a user has the same list open on a phone and a laptop and completes a task on the phone. Walk through exactly what the laptop sees and when, and how you would make the two converge.
In Part 3, your per-user limiter is keyed on the user token, but the burst comes from a NAT'd corporate network where many users share one IP. How does that change your keying, and what breaks if you key on IP instead?
In Part 2, a downstream task service receives a call from the edge service. How does it know the call is genuinely from the edge service and on behalf of a specific authenticated user, without trusting a header the client could forge?
In Part 5, you need to rename a response field and change a status code. Walk through shipping that as v2 while v1 mobile clients keep working, including how you decide when it is safe to remove v1 .

Design a To-Do List Service (CRUD, Auth, Rate Limiting, Caching & API Versioning)

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

Part 1

What This Part Should Cover Premium

Part 2

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 3

What This Part Should Cover Premium

Part 4

What This Part Should Cover Premium

Part 5

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Design a To-Do List Service (CRUD, Auth, Rate Limiting, Caching & API Versioning)

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

Part 1

What This Part Should Cover Premium

Part 2

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 3

What This Part Should Cover Premium

Part 4

What This Part Should Cover Premium

Part 5

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP