Design a Rate Limiter with Burst Allowance and Distributed Coordination
Context
You are designing a rate limiter for an API gateway that serves high QPS traffic. The limiter should cap requests to a configured rate (QPS) with an optional burst allowance, operate correctly under concurrency, and scale across multiple application instances.
Requirements
-
Single-instance design:
-
Implement and compare two approaches: token bucket and sliding window.
-
Specify data structures, algorithms, and precise behavior (including burst allowance).
-
Provide time/space complexity.
-
Ensure correctness under concurrent access.
-
Distributed design across multiple app instances:
-
Coordination strategies: centralized store (e.g., Redis) vs. sharded counters.
-
Address clock skew tolerance, atomicity, idempotency.
-
Discuss failure modes (node loss, partial updates) and strategies for consistency and fairness.
-
Developer-facing API:
-
Define clear APIs (e.g., Allow, Acquire) and example configurations.
-
Operations:
-
Monitoring/alerting for saturation and error conditions.
Provide code-like pseudocode where helpful.