Scenario
You are building a backend for an “insight platform”. The platform exposes HTTP APIs that are called by many tenants and many end-consumers.
You need to design a rate-limiting layer with two concurrent limits:
-
API-level limit:
max
100 requests/second per API key
(tenant/application).
-
Consumer-level limit:
max
10 requests/second per consumer
(end user / device / client id).
A request should be allowed only if it satisfies both limits.
Requirements
-
Enforcement point:
rate limiter sits in front of multiple stateless API servers (e.g., gateway/middleware).
-
Correctness target:
practical accuracy under high concurrency; avoid letting traffic exceed limits by large margins.
-
Latency:
add minimal overhead (single-digit milliseconds typical).
-
Scale:
handle many unique API keys and consumers; traffic can spike.
-
Operability:
metrics and logs for throttling decisions.
-
Behavior:
when over limit, return HTTP
429
with a helpful response (e.g., retry-after).
Deliverables
-
Choose a rate-limiting algorithm and explain why (token bucket / leaky bucket / sliding window / fixed window, etc.).
-
Propose a distributed design (single node vs multi node) that works with multiple API servers.
-
Show how you would enforce
both
limits atomically (or explain acceptable approximations).
-
Discuss data model, keying, expiration, and failure modes.