PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Microsoft

Design a To-Do List Service (CRUD, Auth, Rate Limiting, Caching & API Versioning)

Last updated: Jun 24, 2026

Quick Overview

This system design question evaluates practical full-stack architectural thinking across REST API design, authentication, rate limiting, and caching in a multi-tenant SaaS context. It assesses whether an engineer can reason about operational concerns — data isolation, burst traffic, API versioning, and cache consistency — beyond basic CRUD implementation.

  • medium
  • Microsoft
  • System Design
  • Software Engineer

Design a To-Do List Service (CRUD, Auth, Rate Limiting, Caching & API Versioning)

Company: Microsoft

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Technical Screen

Design a to-do list service. Users sign in, create multiple named lists, and within each list create, view, update, and delete tasks. The service is accessed by a web frontend and a mobile app, and you are expected to design the REST API and the backend that powers it. Start from the core CRUD product, then take the design through the operational concerns a full-stack engineer is expected to reason about: authentication and authorization, handling a single user that bursts to roughly 500 requests per second, ordering and listing tasks, caching (and the staleness problems caching introduces), REST API versioning, and how internal services authenticate to one another. ### Constraints & Assumptions - Multi-tenant SaaS: each user owns their lists and tasks; a user must never see or mutate another user's data. - A list contains tasks. A task has at minimum: an id, the parent list id, a title, a completion flag, a due date (optional), and timestamps. - Read-heavy workload (viewing tasks dominates), but writes (add/update/complete/delete) are frequent. - Target API latency: p99 under ~150 ms for reads. - One user can briefly burst to ~500 requests/second (e.g., a misbehaving client or an aggressive sync loop); the system must stay healthy for everyone else. - The backend is composed of more than one internal service (e.g., an API gateway / edge service plus a task service), so service-to-service calls happen on the request path. - Assume a managed relational database is available, plus a managed cache (e.g., Redis). ### Clarifying Questions to Ask - What is the scale — daily active users, total users, average lists/tasks per user — and what is the read:write ratio? - Do we need real-time sync across a user's devices (push/websockets), or is pull-on-refresh acceptable? - Are there sharing/collaboration features (multiple users on one list), or is every list owned by exactly one user? - What consistency does the product need — is it acceptable for a just-completed task to briefly appear incomplete on another device (eventual consistency), or must reads be strongly consistent? - Is the 500 req/s burst a legitimate use case we must serve, or abuse we should throttle? Per-user, per-IP, or per-token? - Are internal services inside one trusted network/VPC, or can they be reached from outside, and is there a service mesh available? ### Part 1 Design the core to-do service: the data model, the REST API for lists and tasks (CRUD), and the read/write paths. Cover how a request flows from client to database and back. ```hint Data model Two tables with a one-to-many relationship (`list` -> `task`), both scoped by `user_id`. Index on `(user_id)` for lists and `(list_id)` for tasks so every query is naturally tenant-scoped. ``` ```hint API shape Model lists and tasks as nested REST resources: `/lists`, `/lists/{listId}`, `/lists/{listId}/tasks`, `/lists/{listId}/tasks/{taskId}`. Map CRUD onto `POST`/`GET`/`PUT`(or `PATCH`)/`DELETE`. Paginate the task-list read. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 2 Explain how you authenticate users and authorize requests. Define the difference between authentication and authorization, and show where each happens in your request flow. Then explain how internal services authenticate to each other when one service calls another on the request path. ```hint Two distinct questions Authentication = "who are you" (verify identity, e.g., validate a signed token / session). Authorization = "what are you allowed to do" (enforce that this user owns this list before mutating it). Keep them as separate layers. ``` ```hint Service-to-service User-facing tokens (a user's JWT) are the wrong credential for service-to-service trust. Reach for mutual TLS, signed service tokens (short-lived JWT/OAuth2 client-credentials), or a service mesh that injects identity — so a downstream service can verify the *caller service*, not just relay the user. ``` #### Clarifying Questions for this Part - Are we using a third-party identity provider (OAuth2/OIDC) or rolling our own session/token system? - Should authorization be coarse (own-your-data) only, or do we need roles/sharing (owner vs. collaborator vs. viewer)? #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 3 A single user starts sending ~500 requests/second. Decide whether to absorb this with rate limiting, caching, or both, and design the mechanism. Explain what you protect, where the limiter lives, and what the client experiences. ```hint Limit vs. absorb These solve different problems: rate limiting *rejects* excess traffic to protect the backend; caching *absorbs* repeated reads so they never reach the DB. A 500 req/s read burst is mostly cacheable; a write burst must be throttled. ``` ```hint Algorithm + placement Token bucket or sliding-window counter keyed per user (token), enforced at the edge/gateway with shared state in Redis so it holds across instances. Return HTTP `429` with a `Retry-After` header. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 4 Design how tasks are listed and ordered (e.g., by due date, by creation time, by manual priority). Then add a cache for the read path. Explain the caching strategy, the staleness problems it introduces, and how you keep cached data correct. ```hint Ordering Order in the database with an indexed `ORDER BY` (e.g., a composite index on `(list_id, due_date)`), and use keyset/cursor pagination rather than `OFFSET` for stable, efficient paging. For manual ordering, store an explicit position/rank column. ``` ```hint Cache + staleness Cache-aside on a per-list key (`tasks:{listId}`). The hard part is invalidation: a stale cache shows a deleted or already-completed task. On every write, invalidate (or update) that key, and set a TTL as a safety net so a missed invalidation self-heals. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 5 Explain REST API versioning. Why put a version such as `v1` in the path (`/v1/lists`)? What problem does it solve, and how do you evolve the API without breaking existing mobile clients you cannot force to upgrade? ```hint Why version at all Mobile clients ship and live in users' pockets for months; you cannot force-upgrade them. Versioning lets you make breaking changes (rename a field, change a response shape) while old clients keep calling the contract they were built against. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### What a Strong Answer Covers ```premium-lock What a Strong Answer Covers ``` ### Follow-up Questions - In Part 4, suppose a user has the same list open on a phone and a laptop and completes a task on the phone. Walk through exactly what the laptop sees and when, and how you would make the two converge. - In Part 3, your per-user limiter is keyed on the user token, but the burst comes from a NAT'd corporate network where many users share one IP. How does that change your keying, and what breaks if you key on IP instead? - In Part 2, a downstream task service receives a call from the edge service. How does it know the call is genuinely from the edge service and on behalf of a specific authenticated user, without trusting a header the client could forge? - In Part 5, you need to rename a response field and change a status code. Walk through shipping that as `v2` while `v1` mobile clients keep working, including how you decide when it is safe to remove `v1`.

Quick Answer: This system design question evaluates practical full-stack architectural thinking across REST API design, authentication, rate limiting, and caching in a multi-tenant SaaS context. It assesses whether an engineer can reason about operational concerns — data isolation, burst traffic, API versioning, and cache consistency — beyond basic CRUD implementation.

Related Interview Questions

  • Design A Scalable Web Crawler - Microsoft (medium)
  • Design User Re-engagement Notifications - Microsoft (medium)
  • Design a typeahead search service - Microsoft (hard)
  • Design a Secure Copilot API - Microsoft
  • Design a URL Shortener - Microsoft (hard)
Microsoft logo
Microsoft
Jun 11, 2026, 12:00 AM
Software Engineer
Technical Screen
System Design
0
0

Design a to-do list service. Users sign in, create multiple named lists, and within each list create, view, update, and delete tasks. The service is accessed by a web frontend and a mobile app, and you are expected to design the REST API and the backend that powers it.

Start from the core CRUD product, then take the design through the operational concerns a full-stack engineer is expected to reason about: authentication and authorization, handling a single user that bursts to roughly 500 requests per second, ordering and listing tasks, caching (and the staleness problems caching introduces), REST API versioning, and how internal services authenticate to one another.

Constraints & Assumptions

  • Multi-tenant SaaS: each user owns their lists and tasks; a user must never see or mutate another user's data.
  • A list contains tasks. A task has at minimum: an id, the parent list id, a title, a completion flag, a due date (optional), and timestamps.
  • Read-heavy workload (viewing tasks dominates), but writes (add/update/complete/delete) are frequent.
  • Target API latency: p99 under ~150 ms for reads.
  • One user can briefly burst to ~500 requests/second (e.g., a misbehaving client or an aggressive sync loop); the system must stay healthy for everyone else.
  • The backend is composed of more than one internal service (e.g., an API gateway / edge service plus a task service), so service-to-service calls happen on the request path.
  • Assume a managed relational database is available, plus a managed cache (e.g., Redis).

Clarifying Questions to Ask

  • What is the scale — daily active users, total users, average lists/tasks per user — and what is the read:write ratio?
  • Do we need real-time sync across a user's devices (push/websockets), or is pull-on-refresh acceptable?
  • Are there sharing/collaboration features (multiple users on one list), or is every list owned by exactly one user?
  • What consistency does the product need — is it acceptable for a just-completed task to briefly appear incomplete on another device (eventual consistency), or must reads be strongly consistent?
  • Is the 500 req/s burst a legitimate use case we must serve, or abuse we should throttle? Per-user, per-IP, or per-token?
  • Are internal services inside one trusted network/VPC, or can they be reached from outside, and is there a service mesh available?

Part 1

Design the core to-do service: the data model, the REST API for lists and tasks (CRUD), and the read/write paths. Cover how a request flows from client to database and back.

What This Part Should Cover Premium

Part 2

Explain how you authenticate users and authorize requests. Define the difference between authentication and authorization, and show where each happens in your request flow. Then explain how internal services authenticate to each other when one service calls another on the request path.

Clarifying Questions for this Part

  • Are we using a third-party identity provider (OAuth2/OIDC) or rolling our own session/token system?
  • Should authorization be coarse (own-your-data) only, or do we need roles/sharing (owner vs. collaborator vs. viewer)?

What This Part Should Cover Premium

Part 3

A single user starts sending ~500 requests/second. Decide whether to absorb this with rate limiting, caching, or both, and design the mechanism. Explain what you protect, where the limiter lives, and what the client experiences.

What This Part Should Cover Premium

Part 4

Design how tasks are listed and ordered (e.g., by due date, by creation time, by manual priority). Then add a cache for the read path. Explain the caching strategy, the staleness problems it introduces, and how you keep cached data correct.

What This Part Should Cover Premium

Part 5

Explain REST API versioning. Why put a version such as v1 in the path (/v1/lists)? What problem does it solve, and how do you evolve the API without breaking existing mobile clients you cannot force to upgrade?

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

  • In Part 4, suppose a user has the same list open on a phone and a laptop and completes a task on the phone. Walk through exactly what the laptop sees and when, and how you would make the two converge.
  • In Part 3, your per-user limiter is keyed on the user token, but the burst comes from a NAT'd corporate network where many users share one IP. How does that change your keying, and what breaks if you key on IP instead?
  • In Part 2, a downstream task service receives a call from the edge service. How does it know the call is genuinely from the edge service and on behalf of a specific authenticated user, without trusting a header the client could forge?
  • In Part 5, you need to rename a response field and change a status code. Walk through shipping that as v2 while v1 mobile clients keep working, including how you decide when it is safe to remove v1 .

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Microsoft•More Software Engineer•Microsoft Software Engineer•Microsoft System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.