System Design: High-Throughput Distributed Rate Limiting Service
Context
You are designing a multi-tenant rate limiting platform for an edge/gateway layer that protects downstream services. The system must enforce both per-user and global limits, tolerate bursts, and approximate a sliding window across multiple regions at peak 10M requests/second.
Requirements
Design a distributed rate limiting service that:
-
Supports per-user and global limits with burst tolerance and approximately sliding-window semantics.
-
Handles 10M RPS peak across multiple regions with low latency.
-
Specifies:
-
Public API (check, configure, introspect) and enforcement integration on the request path.
-
Algorithms (token bucket, leaky bucket, sliding window) and why.
-
Data model for counters/buckets/policies.
-
Sharding and hot-key mitigation (e.g., consistent hashing, key splitting).
-
Storage choices (in-memory vs. Redis vs. custom), replication, and time coordination.
-
Failure handling and partial outages; fairness and consistency (eventual vs strong) where needed.
-
Scale-out plan: capacity planning formulas, machines needed for a 2× traffic spike, autoscaling signals, and backpressure/throttling strategies.