Design Multi-Dimensional Request Rate Limiting
Company: Roblox
Role: Software Engineer
Category: System Design
Difficulty: easy
Interview Round: Technical Screen
Design a **rate limiter** for a backend service that runs across multiple application servers. The system must decide, for every incoming request, whether to allow or reject it based on configurable request-rate limits.
This is a two-part problem: first build a standard single-key rate limiter, then extend it to enforce limits across several request fields at once.
### Constraints & Assumptions
- The service is distributed: many stateless application servers handle requests behind a load balancer, so limit state cannot live only in one process's memory.
- Limits are configurable (for example, "100 requests per minute per user", "10 requests per second per IP") and should be changeable without redeploying application code.
- Every request must receive a fast allow/deny decision; rate-limit checks sit on the hot path of every API call.
- When a request is rejected, the system should be able to report why (which limit was hit) and when the caller can retry.
- You may assume access to a fast shared store (such as an in-memory key-value store) for counters; you should justify whatever you choose.
---
## Part 1 — Standard distributed rate limiter
Build a rate limiter that can limit incoming requests by a single key such as **user ID**, **IP address**, **API key**, or **endpoint**. It must work correctly when multiple application servers handle requests concurrently for the same key.
Address at minimum:
- The external/internal **API** the limiter exposes to the application code.
- The **algorithm** used to track and enforce the limit, and why you chose it.
- The **data model** for limit state (what is stored per key) and where it lives.
- How updates stay **correct under concurrency** across distributed servers.
- What the limiter returns to the caller on **allow vs. reject**.
```hint Where to start
Pin down a single concrete rule first, e.g. "$N$ requests per window per `user_id`". Decide the **key** you store state under (something like `rule:user_id` plus the value), then reason about what state that key needs to hold and who can mutate it.
```
```hint Algorithm options
Enumerate the standard families — **fixed window counter**, **sliding window log**, **sliding window counter**, **token bucket** — and compare them on accuracy, memory cost, and burst behavior. The boundary-burst weakness of fixed-window and the burst tolerance of token bucket are the usual discussion points.
```
```hint Concurrency across servers
Walk through what happens when two servers handle requests for the same key at nearly the same instant. Trace a naive "read the counter, decide, then write it back" sequence and ask whether both requests could be allowed when only one should be. What property does the update need, and where must it be enforced?
```
---
## Part 2 — Multi-dimensional (multi-field) rate limiting
Extend the design so that a single request may carry several fields, and **different rate-limit rules may apply to different fields or combinations of fields**. For example, one request might include `user_id`, `ip`, `endpoint`, `region`, and `operation`, and the system may need to enforce separate limits such as: per user, per IP, per `(user, endpoint)` pair, and per `(region, operation)`.
Address at minimum:
- A **data model / rule schema** that lets an operator declare which fields a rule keys on, its limit, and its action.
- How the limiter **matches** which rules apply to a given request without scanning every rule on every call.
- **Key generation** when a rule keys on a combination of fields.
- The **decision logic when multiple limits apply at once** — how you combine the per-rule outcomes into one accept/reject, and what you return when several are violated.
- The **consistency problem** created by touching several counters per request, and how you handle it.
```hint Rule as data
Model a rule as a configurable record: the **dimensions** (fields) it keys on, a predicate for when it applies (e.g. `operation = write`), the limit/window, an action, and a priority. Then a request maps to *all* matching rules rather than one hard-coded check.
```
```hint Combining multiple verdicts
Decide the combination policy: a request is allowed only if **every** applicable blocking rule passes; on rejection, pick a single answer to surface (e.g. the **most restrictive** — the largest retry-after). Be explicit about which violated rule you report.
```
```hint The cross-counter consistency trap
Trace a request that must satisfy several rules when you update each counter one at a time: what happens to the counters you already touched if a later rule rejects the request? Decide how much that inconsistency matters at your target scale, and let that drive what you do about it.
```
---
### Clarifying Questions to Ask
- What request volume and per-key cardinality should this handle (requests/sec, number of distinct users/IPs/keys)? This drives the store choice and memory model.
- What latency budget does the rate-limit check have on the request path (e.g. single-digit milliseconds)?
- How strict must accounting be — is exact enforcement required, or is slight over/under-counting acceptable for abuse prevention?
- How many distinct rules and field combinations are expected, and how often do they change?
- When the limit store is unavailable, should the system fail open (allow) or fail closed (reject)? Does it differ by endpoint risk?
- Are limits global, or per-region/per-tenant? Does a single request need limits enforced across regions?
### What a Strong Answer Covers
- A clear **API contract** between application code and the limiter (input context, output decision with retry-after and the violated rule).
- A justified **algorithm choice** with explicit trade-offs (accuracy vs. memory vs. burst handling), not just one named algorithm.
- A concrete **data model** for both single-key state and the multi-field **rule schema**.
- **Atomicity / concurrency correctness** for the read-refill-decrement update across distributed servers.
- A coherent **multi-rule decision policy** and an honest treatment of the **cross-counter consistency** problem (atomic multi-key, co-location, or documented approximation).
- **Scalability** mechanics: distributed store, local rule caching, TTLs for key cleanup, hot-key and high-cardinality handling.
- **Failure handling** (fail-open vs. fail-closed, timeouts, fallbacks) and **observability** (allowed/blocked counts per rule, hot keys, latency) for post-deploy tuning.
### Follow-up Questions
- How does the design change at 100x request volume, and what breaks first — the counter store, the hot keys, or the rule-matching path?
- A single very popular endpoint or a shared NAT IP becomes a **hot key**. How do you keep that one key from becoming a bottleneck?
- How would you support a rule that costs more than one unit per request (weighted/cost-based limiting)?
- If you must enforce a **global** limit (e.g. across regions) with very low latency, how do you reconcile that with the need to avoid a single centralized chokepoint?
Quick Answer: This question evaluates proficiency in designing distributed rate limiting systems, including API and data model design, consistency and scalability trade-offs, and algorithms for enforcing per-key and multi-dimensional limits.