How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a medium difficulty System Design question, commonly asked during Technical Screen rounds at Cursor.

What role is this question designed for?

This question is commonly asked for Backend Engineer candidates at Cursor during technical interviews.

Design Scalable Notification Rate Limiter | Cursor Interview Question

Q: Design Scalable Notification Rate Limiter

This question evaluates expertise in designing distributed rate-limiting systems with hierarchical quotas, focusing on concurrency control, exact rolling-window counting, and correctness under concurrent stateless servers; it belongs to the System Design domain and tests practical application of distributed systems, storage, and API design as well as conceptual understanding of atomicity and consistency. It is commonly asked to assess reasoning about atomic decision-making across multiple scopes, scalability and partitioning strategies, handling reconfiguration and operational concerns, and the trade-offs between correctness and system performance (English).

Design a production-grade rate limiter for notification sending that enforces limits at three levels of an organizational hierarchy.

The system exposes a single decision call, should_send_notification(user_id, timestamp) -> bool, that returns whether a notification send should be accepted. Every user belongs to exactly one team, and every team belongs to exactly one company (a user -> team -> company mapping is available).

A send is accepted only if it stays within an exact rolling 10-minute window at all three scopes simultaneously:

at most 3 accepted notifications per user in any 10-minute window
at most 10 accepted notifications per team in any 10-minute window
at most 20 accepted notifications per company in any 10-minute window

If accepting the send would violate any of the three limits, it is rejected and does not count against any scope. The decision must be correct under concurrent requests across many stateless application servers.

You should first reason through the core decision algorithm (and be ready to walk a concrete test case through it), then extend it into a full distributed-systems design covering the API, storage, atomicity, scaling, hierarchy changes, and operational concerns.

Constraints & Assumptions

Exactness: the rolling window is exact (a true sliding window), not a fixed/tumbling bucket — an event expires exactly 10 minutes after it occurred.
Hierarchy: user -> team -> company is a strict tree; each user maps to exactly one team and one company. Mappings change rarely but can change (reorgs, user transfers).
Atomic counting rule: a rejected send increments no counter; an accepted send increments all three (user, team, company).
Scale (assume, ask to confirm): order of $10^7$ users, up to ~ $10^6$ req/s peak globally, single-digit-millisecond p99 latency budget for the decision, callers are many stateless app servers behind a load balancer.
Limits are small (3 / 10 / 20), which is a meaningful property you can exploit in the data model.

Clarifying Questions to Ask

Is the 10-minute window truly exact/sliding, or is an approximate fixed or sliding-window-counter acceptable for cheaper memory?
When a send is rejected, should the caller be told which scope (user/team/company) caused the rejection?
What is the latency budget (p99) and expected peak QPS, and is the traffic skewed toward a few large companies?
On a backend (Redis/store) outage, should the system fail-open (allow sends) or fail-closed (block sends)?
How fresh must the user -> team -> company mapping be — is briefly counting a transferred user against their old team acceptable?
Are timestamps server-authoritative, or can callers pass their own timestamp (and can it be in the past / out of order)?

What a Strong Answer Covers

A correct exact rolling-window algorithm and a clear argument for why it's exact (and how memory stays bounded by the small limits).
A clean should_send_notification implementation walked through a concrete test case, including a boundary case at exactly 10 minutes and a case where one scope rejects.
A single atomic check-and-update across all three scopes, with an explicit account of how concurrency and the "reject increments nothing" rule are handled.
A storage and sharding scheme that keeps the three related keys colocated, plus how the stateless service layer scales horizontally.
Handling of the metadata mapping : caching user -> team -> company , cache misses, and hierarchy changes (transfers/reorgs) with their correctness trade-offs.
Operational concerns: idempotency ( request_id ), clock skew , hot keys/shards , failure behavior (fail-open vs fail-closed), and observability .
Explicit trade-offs (exactness vs memory, colocation vs hot shards, freshness vs latency) rather than a single "best" answer.

Follow-up Questions

The limits change from 3/10/20 to thousands per window. Does the exact-log approach still hold, and what would you switch to (e.g., sliding-window counter, bucketed approximation), and what accuracy do you give up?
How would you add a fourth scope (e.g., a per-notification-channel limit), or make limits configurable per company, without rewriting the atomic core?
A single whale company's traffic makes its shard a bottleneck. How do you relieve the hot shard while preserving the exact company-level limit?
The Redis/state cluster is briefly unavailable. Walk through your degraded mode end-to-end — what the service returns, what gets logged, and how counts reconcile on recovery.

Design a production-grade rate limiter for notification sending that enforces limits at three levels of an organizational hierarchy.

A send is accepted only if it stays within an exact rolling 10-minute window at all three scopes simultaneously:

at most 3 accepted notifications per user in any 10-minute window
at most 10 accepted notifications per team in any 10-minute window
at most 20 accepted notifications per company in any 10-minute window

Constraints & Assumptions

Exactness: the rolling window is exact (a true sliding window), not a fixed/tumbling bucket — an event expires exactly 10 minutes after it occurred.
Hierarchy: user -> team -> company is a strict tree; each user maps to exactly one team and one company. Mappings change rarely but can change (reorgs, user transfers).
Atomic counting rule: a rejected send increments no counter; an accepted send increments all three (user, team, company).
Scale (assume, ask to confirm): order of $10^7$ users, up to ~ $10^6$ req/s peak globally, single-digit-millisecond p99 latency budget for the decision, callers are many stateless app servers behind a load balancer.
Limits are small (3 / 10 / 20), which is a meaningful property you can exploit in the data model.

Clarifying Questions to Ask

Is the 10-minute window truly exact/sliding, or is an approximate fixed or sliding-window-counter acceptable for cheaper memory?
When a send is rejected, should the caller be told which scope (user/team/company) caused the rejection?
What is the latency budget (p99) and expected peak QPS, and is the traffic skewed toward a few large companies?
On a backend (Redis/store) outage, should the system fail-open (allow sends) or fail-closed (block sends)?
How fresh must the user -> team -> company mapping be — is briefly counting a transferred user against their old team acceptable?
Are timestamps server-authoritative, or can callers pass their own timestamp (and can it be in the past / out of order)?

What a Strong Answer Covers

A correct exact rolling-window algorithm and a clear argument for why it's exact (and how memory stays bounded by the small limits).
A clean should_send_notification implementation walked through a concrete test case, including a boundary case at exactly 10 minutes and a case where one scope rejects.
A single atomic check-and-update across all three scopes, with an explicit account of how concurrency and the "reject increments nothing" rule are handled.
A storage and sharding scheme that keeps the three related keys colocated, plus how the stateless service layer scales horizontally.
Handling of the metadata mapping : caching user -> team -> company , cache misses, and hierarchy changes (transfers/reorgs) with their correctness trade-offs.
Operational concerns: idempotency ( request_id ), clock skew , hot keys/shards , failure behavior (fail-open vs fail-closed), and observability .
Explicit trade-offs (exactness vs memory, colocation vs hot shards, freshness vs latency) rather than a single "best" answer.

Follow-up Questions

The limits change from 3/10/20 to thousands per window. Does the exact-log approach still hold, and what would you switch to (e.g., sliding-window counter, bucketed approximation), and what accuracy do you give up?
How would you add a fourth scope (e.g., a per-notification-channel limit), or make limits configurable per company, without rewriting the atomic core?
A single whale company's traffic makes its shard a bottleneck. How do you relieve the hot shard while preserving the exact company-level limit?
The Redis/state cluster is briefly unavailable. Walk through your degraded mode end-to-end — what the service returns, what gets logged, and how counts reconcile on recovery.

Design Scalable Notification Rate Limiter

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Design Scalable Notification Rate Limiter

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP