Scenario
You need to design a rate limiting system that can be used by multiple API gateways and/or many backend services (not just embedded in a single gateway).
The system should enforce policies such as:
-
Per API key / per user / per IP limits
-
Example: at most
y
requests per
x
seconds (or buckets)
Requirements
Cover the following:
Functional
-
Enforce common rate limit algorithms (fixed window, sliding window, token bucket—pick one and justify).
-
Support multiple independent limit keys (e.g.,
tenantId + api + userId
).
-
Return decision fast enough to sit on the request path.
Non-functional
-
Must scale to very high traffic ("Atlassian-scale").
-
Must work correctly with
many gateways
calling it concurrently.
-
Discuss consistency requirements (what correctness means for rate limiting).
-
Availability and failure mode: fail-open vs fail-closed.
-
Observability: metrics and logging.
Design discussion prompts
-
If the limiter is shared by many gateways/services, what are the risks of inconsistent counters?
-
What storage/technology would you choose (e.g., Redis, SQL, DynamoDB) and why?
-
How would you handle concurrency control and atomicity of updates?
-
How do you shard/scale the system and avoid hot partitions?