Design and implement an in-memory leaky-bucket rate limiter.
A limiter is configured with:
-
capacity
: the maximum amount of queued load the bucket can hold
-
leak_rate
: how many units drain from the bucket per second
Each incoming request contributes 1 unit to the bucket. When a request arrives:
-
First, reduce the current bucket level based on how much time has elapsed since the last update.
-
If adding the new request would keep the bucket level at or below
capacity
, accept the request and update the bucket level.
-
Otherwise, reject the request.
Implement a thread-safe API such as allow_request() for use in a multithreaded server.
Follow-up discussion:
-
How would you make the implementation safe under high concurrency?
-
What synchronization strategy would you use, and what are the trade-offs?
-
How would you handle time precision, bursty traffic, and clock-related edge cases?
-
How would you scale this design if many application instances must share the same rate limit?