Design Scalable Notification Rate Limiter
Company: Cursor
Role: Backend Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
Design a production-grade rate limiter for notification sending that enforces limits at three levels of an organizational hierarchy.
The system exposes a single decision call, `should_send_notification(user_id, timestamp) -> bool`, that returns whether a notification send should be **accepted**. Every user belongs to exactly one team, and every team belongs to exactly one company (a `user -> team -> company` mapping is available).
A send is accepted only if it stays within an **exact rolling 10-minute window** at all three scopes simultaneously:
- at most **3** accepted notifications per `user` in any 10-minute window
- at most **10** accepted notifications per `team` in any 10-minute window
- at most **20** accepted notifications per `company` in any 10-minute window
If accepting the send would violate **any** of the three limits, it is rejected and **does not** count against any scope. The decision must be correct under concurrent requests across many stateless application servers.
You should first reason through the core decision algorithm (and be ready to walk a concrete test case through it), then extend it into a full distributed-systems design covering the API, storage, atomicity, scaling, hierarchy changes, and operational concerns.
```hint Where to start
The limits are tiny (3/10/20). With such small caps, an *exact sliding-window log* — storing the actual accepted timestamps per entity and counting those still inside the window — is cheap and avoids the inaccuracy of fixed-window or token-bucket counters. Think about what data structure makes "drop everything older than `now - 600s`, then count" a single operation.
```
```hint Atomicity across three scopes
The user, team, and company checks must succeed-or-fail together, with no interleaving from concurrent requests. Consider how to colocate all three keys on one shard (a hash tag / partition key) so a single server-side script can read all three counts, decide, and write — all atomically. What key should you shard on so that one request never touches two shards?
```
```hint The "reject costs nothing" trap
The atomic step must *check all three, then commit only if all pass*. A naive design that increments the user counter, then finds the team is full, has already corrupted the user count. Decide first, write second — and make sure idempotent retries (same `request_id`) don't double-count.
```
### Constraints & Assumptions
- **Exactness:** the rolling window is exact (a true sliding window), not a fixed/tumbling bucket — an event expires exactly 10 minutes after it occurred.
- **Hierarchy:** `user -> team -> company` is a strict tree; each user maps to exactly one team and one company. Mappings change rarely but can change (reorgs, user transfers).
- **Atomic counting rule:** a rejected send increments no counter; an accepted send increments all three (user, team, company).
- **Scale (assume, ask to confirm):** order of $10^7$ users, up to ~$10^6$ req/s peak globally, single-digit-millisecond p99 latency budget for the decision, callers are many stateless app servers behind a load balancer.
- **Limits are small** (3 / 10 / 20), which is a meaningful property you can exploit in the data model.
### Clarifying Questions to Ask
- Is the 10-minute window truly exact/sliding, or is an approximate fixed or sliding-window-counter acceptable for cheaper memory?
- When a send is rejected, should the caller be told *which* scope (user/team/company) caused the rejection?
- What is the latency budget (p99) and expected peak QPS, and is the traffic skewed toward a few large companies?
- On a backend (Redis/store) outage, should the system fail-open (allow sends) or fail-closed (block sends)?
- How fresh must the `user -> team -> company` mapping be — is briefly counting a transferred user against their old team acceptable?
- Are timestamps server-authoritative, or can callers pass their own `timestamp` (and can it be in the past / out of order)?
### What a Strong Answer Covers
- A correct **exact rolling-window** algorithm and a clear argument for why it's exact (and how memory stays bounded by the small limits).
- A clean `should_send_notification` implementation walked through a concrete test case, including a boundary case at exactly 10 minutes and a case where one scope rejects.
- A single **atomic** check-and-update across all three scopes, with an explicit account of how concurrency and the "reject increments nothing" rule are handled.
- A storage and **sharding** scheme that keeps the three related keys colocated, plus how the stateless service layer scales horizontally.
- Handling of the **metadata mapping**: caching `user -> team -> company`, cache misses, and hierarchy changes (transfers/reorgs) with their correctness trade-offs.
- Operational concerns: **idempotency** (`request_id`), **clock skew**, **hot keys/shards**, **failure behavior** (fail-open vs fail-closed), and **observability**.
- Explicit trade-offs (exactness vs memory, colocation vs hot shards, freshness vs latency) rather than a single "best" answer.
### Follow-up Questions
- The limits change from 3/10/20 to thousands per window. Does the exact-log approach still hold, and what would you switch to (e.g., sliding-window counter, bucketed approximation), and what accuracy do you give up?
- How would you add a **fourth** scope (e.g., a per-notification-channel limit), or make limits configurable per company, without rewriting the atomic core?
- A single whale company's traffic makes its shard a bottleneck. How do you relieve the hot shard while preserving the exact company-level limit?
- The Redis/state cluster is briefly unavailable. Walk through your degraded mode end-to-end — what the service returns, what gets logged, and how counts reconcile on recovery.
Quick Answer: This question evaluates expertise in designing distributed rate-limiting systems with hierarchical quotas, focusing on concurrency control, exact rolling-window counting, and correctness under concurrent stateless servers; it belongs to the System Design domain and tests practical application of distributed systems, storage, and API design as well as conceptual understanding of atomicity and consistency. It is commonly asked to assess reasoning about atomic decision-making across multiple scopes, scalability and partitioning strategies, handling reconfiguration and operational concerns, and the trade-offs between correctness and system performance (English).