Design a distributed rate limiter for a large-scale API platform.
The rate limiter should protect backend services from abuse and traffic spikes while allowing legitimate users to proceed with minimal latency.
Requirements:
-
Support limits such as requests per user, API key, IP address, endpoint, and global service limits.
-
Example quotas: 100 requests per minute per user, 1,000 requests per minute per API key, and configurable endpoint-specific limits.
-
Work across many stateless application servers and multiple data centers or regions.
-
Return a clear rejection response when a client exceeds its quota.
-
Keep request-path latency very low, preferably single-digit milliseconds.
-
Support dynamic configuration updates without redeploying services.
-
Provide observability for allowed requests, blocked requests, hot keys, and configuration errors.
Discuss the API, architecture, storage choices, algorithms, consistency trade-offs, failure handling, and monitoring.