Problem
Design a multi-tenant, large-scale distributed Rate Limiter service used by many internal teams.
The service should allow product teams to enforce request limits (e.g., per user, per API key, per IP, per tenant) at very high QPS.
Requirements to clarify
-
Who calls it?
(internal services via gateway/service mesh; optional external-facing)
-
Limit types:
per-second / per-minute limits; burst handling
-
Algorithms:
fixed window / sliding window / token bucket / leaky bucket
-
Correctness:
strict vs best-effort; allowable over-limit error
-
Scope:
single region vs
multi-region
; cross-region failover
-
Isolation:
per-tenant quotas, noisy-neighbor protection, hotspot keys (large customers)
-
Operational needs:
observability, config rollout, auditability, safe degradation
Expected output
Explain a concrete architecture (components + data model + API), and walk through:
-
request path and decision logic
-
storage/sharding strategy
-
multi-region consistency trade-offs
-
failure modes (store down, partitions) and
fail-open vs fail-close
-
evolution plan as scale grows