PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's ability to design thread-safe concurrent data structures, focusing on the token bucket algorithm for rate limiting. It tests understanding of atomicity, lock contention, and time-based state management, commonly asked to assess practical concurrency and systems programming skills at the coding-interview level.

  • Uber
  • Coding & Algorithms
  • Software Engineer

Thread-Safe Token-Bucket Rate Limiter

Company: Uber

Role: Software Engineer

Category: Coding & Algorithms

Interview Round: Onsite

# Thread-Safe Token-Bucket Rate Limiter Design and implement a **thread-safe rate limiter** using the **token bucket** algorithm. The limiter is shared by many worker threads and decides, for each incoming request, whether it is allowed to proceed or should be rejected (or throttled) because the caller has exceeded its allowed rate. ### Token bucket model A bucket has two parameters: - `capacity` — the maximum number of tokens the bucket can hold (this is the allowed burst size). - `refillRatePerSecond` — tokens are added to the bucket continuously at this rate, up to but never exceeding `capacity`. Each request consumes some number of tokens (default 1). A request is **allowed** only if enough tokens are currently available; the consumed tokens are then removed from the bucket. If insufficient tokens are available, the request is **denied** and no tokens are consumed. ### Required API Implement a class with at least: - `RateLimiter(capacity, refillRatePerSecond)` — construct a bucket that starts **full** (`capacity` tokens). - `boolean allow()` — attempt to consume 1 token; return `true` if the request is permitted, `false` otherwise. - `boolean allow(int permits)` — attempt to consume `permits` tokens atomically (all-or-nothing); return whether it was permitted. ### Correctness requirements 1. **Thread safety / atomicity.** Many threads call `allow(...)` concurrently. The check-available-then-consume sequence must be atomic so that the bucket can never be over-drawn — across any concurrent execution, the total number of permitted tokens in any window must respect the bucket contents (no two threads may both succeed in spending the same token). 2. **Lazy, time-based refill.** Refill must be computed from elapsed time (using a monotonic clock), not by assuming a background thread runs on a fixed schedule. Refilling lazily on each call is acceptable, but the available token count must never exceed `capacity`, and elapsed-time arithmetic must not lose fractional tokens or drift. 3. **Monotonic time.** Use a monotonic time source so that wall-clock adjustments cannot grant or revoke tokens incorrectly. ### Discussion points to be ready for - **Reducing lock contention** under high concurrency: compare a single global lock against a lock-free / CAS approach (e.g., a compare-and-swap loop over an atomically updated `(tokens, lastRefillTimestamp)` state), and against sharding the limiter across multiple buckets. - How the design generalizes to **per-user / per-key** rate limiting (a bucket per key) and how you would bound memory for a large, sparse key space. - Trade-offs versus a sliding-window or leaky-bucket limiter. ### Example ``` RateLimiter rl = new RateLimiter(capacity = 5, refillRatePerSecond = 1) # bucket starts full with 5 tokens allow() -> true # 4 left allow() -> true # 3 left allow() -> true # 2 left allow() -> true # 1 left allow() -> true # 0 left allow() -> false # empty, denied # ~2 seconds later, ~2 tokens have refilled allow() -> true # 1 left allow() -> true # 0 left allow() -> false # denied ``` ### Constraints - `capacity >= 1`, `refillRatePerSecond > 0`. - The number of available tokens is conceptually a real number in `[0, capacity]`; `allow(permits)` succeeds only when `availableTokens >= permits`.

Quick Answer: This question evaluates a candidate's ability to design thread-safe concurrent data structures, focusing on the token bucket algorithm for rate limiting. It tests understanding of atomicity, lock contention, and time-based state management, commonly asked to assess practical concurrency and systems programming skills at the coding-interview level.

Implement the token-bucket rate-limiting algorithm. A bucket has a `capacity` (max tokens it can hold — the allowed burst size) and a `refill_rate_per_second` (tokens added continuously per second, never exceeding `capacity`). The bucket starts **full**. Each request tries to consume some `permits` tokens **atomically** (all-or-nothing): it is allowed only if at least `permits` tokens are currently available, in which case those tokens are removed; otherwise it is denied and no tokens are consumed. In a real system this limiter is shared by many worker threads, and the check-available-then-consume step must be atomic so the bucket can never be over-drawn. To make the algorithm testable in a single-threaded console, **time is injected explicitly**: instead of reading a real clock, every operation carries its own monotonic timestamp (in seconds). Write a function `solution(capacity, refill_rate_per_second, operations)` where `operations` is a list of `[timestamp, permits]` pairs given in non-decreasing timestamp order. Process the operations in order and return a list of booleans — `True` if that request was allowed, `False` if denied. Refill rule: before serving an operation at time `t`, add `(t - last_time) * refill_rate_per_second` tokens (capping the total at `capacity`), then update `last_time = t`. The bucket is full at the moment of the first operation. Example: `capacity = 5`, `refill_rate_per_second = 1`. Five `allow()` calls at t=0 succeed (draining the bucket), the sixth is denied, and ~2 seconds later two more succeed as ~2 tokens have refilled. Follow-up discussion (not tested here): reducing lock contention with a CAS loop over `(tokens, lastRefill)` vs. a global lock vs. sharding; extending to per-key buckets with bounded memory; trade-offs against sliding-window / leaky-bucket limiters.

Constraints

  • capacity >= 1
  • refill_rate_per_second > 0
  • Each operation is [timestamp_seconds, permits] with permits >= 1
  • Timestamps are non-decreasing (a monotonic clock)
  • Available tokens are conceptually a real number in [0, capacity]; a request succeeds only when available_tokens >= permits
  • The bucket starts full (capacity tokens) at the first operation's timestamp

Examples

Input: (5, 1, [[0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [2, 1], [2, 1], [2, 1]])

Expected Output: [True, True, True, True, True, False, True, True, False]

Explanation: The worked example: capacity 5, refill 1/s. Five single-token requests at t=0 drain the full bucket, the sixth is denied. ~2s later, min(5, 0 + 2*1) = 2 tokens have refilled, so two succeed and the third is denied.

Input: (10, 2, [[0, 5], [0, 5], [0, 1], [1, 2], [1, 1]])

Expected Output: [True, True, False, True, False]

Explanation: Multi-permit bursts: two 5-permit requests empty the bucket of 10, the 1-permit request is denied. At t=1, 1*2 = 2 tokens refill, so the 2-permit request succeeds (bucket back to 0) and the following 1-permit request is denied.

Input: (3, 1, [[0, 3], [100, 1], [100, 3]])

Expected Output: [True, True, False]

Explanation: Refill is capped at capacity: after draining 3 tokens at t=0, a 100-second idle gap would add 100 tokens but the bucket clamps to 3. The 1-permit request succeeds (leaving 2), and the 3-permit request is denied because only 2 remain.

Input: (1, 1, [[0, 1], [0, 1], [1, 1]])

Expected Output: [True, False, True]

Explanation: Capacity of 1: the first request takes the only token, the second (same instant) is denied, and after 1 second exactly 1 token refills so the third succeeds.

Input: (5, 1, [])

Expected Output: []

Explanation: Edge case: no operations means no results.

Input: (5, 2, [[0.0, 5], [0.5, 1], [0.5, 1]])

Expected Output: [True, True, False]

Explanation: Fractional-second refill: after draining 5 tokens at t=0, half a second at 2 tokens/s refills exactly 1 token, so one request succeeds and the next is denied.

Input: (2, 5, [[0, 2], [10, 2], [10, 1]])

Expected Output: [True, True, False]

Explanation: Refill rate exceeds capacity: draining 2 tokens then idling 10s would add 50 tokens but the bucket clamps to its capacity of 2. The 2-permit request succeeds (bucket to 0) and the trailing 1-permit request is denied.

Hints

  1. Keep two pieces of state: the current token count (a float) and the timestamp of the last refill. On each operation compute elapsed = t - last_time.
  2. Refill lazily: tokens += elapsed * refill_rate_per_second, then clamp to capacity so the bucket never overflows. Update last_time = t.
  3. Consume atomically: only subtract `permits` (and return True) when tokens >= permits; otherwise leave tokens untouched and return False. In a real multithreaded implementation this check-then-consume must be guarded by a lock or a CAS loop so two threads can't spend the same token.
Last updated: Jul 1, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Quadtree for 2D Geospatial Points - Uber
  • Group Anagrams - Uber
  • Deep Equality of Two Records - Uber (medium)
  • Shortest Path in a Grid with Blocked Cells - Uber (medium)
  • Design and Implement an LRU Cache - Uber (medium)