PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Altruist

Design scalable rate limiter with token bucket

Last updated: Mar 29, 2026

Quick Overview

This question evaluates understanding of rate-limiting algorithms and scalability concepts, specifically comparing sliding-window and token-bucket approaches with attention to lazy refill, per-client state modeling, memory efficiency, concurrency, and extendability to distributed systems.

  • hard
  • Altruist
  • System Design
  • Software Engineer

Design scalable rate limiter with token bucket

Company: Altruist

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

You are asked to design and implement a rate limiter for an HTTP API. ### Requirements - Limit each client (e.g., user ID or API key) to at most `N` requests per `T` seconds. - The rate limiter should run in memory (assume a single application instance to start). - It must be: - Low latency (per-request check must be fast). - Memory efficient even when there are many distinct clients and very high traffic. - Easy to extend later to a distributed setting. The interviewer guides you through the following steps: 1. **Baseline design (sliding window):** - Describe how you would implement a basic sliding-window rate limiter. - Be explicit about what data you store per client, how you decide whether to allow or reject a request given its timestamp, and the time and space complexity. 2. **Scalability concern:** - Explain what happens when traffic becomes extremely large (e.g., millions of clients, each sending many requests per second). - Why might the naive sliding-window implementation run into memory and performance issues? 3. **Improved design (token bucket with lazy refill): - Redesign the rate limiter using a token bucket algorithm so that memory usage stays small and per-request operations remain O(1). - You should: - Describe the conceptual model of a token bucket (capacity, refill rate, token consumption on each request). - Specify exactly what you store per client (e.g., current token count, last refill timestamp, etc.). - Explain how to perform a **lazy refill**: instead of refilling tokens every second via a background job or timer, you only recalculate and refill tokens when a request for that client arrives, using the request's timestamp. - Show how the lazy refill calculation works for a request arriving at time `now`, given `last_refill_time`, `current_tokens`, bucket `capacity`, and `refill_rate` (tokens per second). - Discuss how this design solves the large-data, low-memory problem compared to the sliding-window implementation. 4. **Additional considerations:** - How would you handle concurrency (multiple threads) accessing the same client's rate-limit state? - How would you clean up or evict inactive clients so the in-memory data structure does not grow without bound? - Briefly mention how you might extend this design to multiple application servers (e.g., by storing state in a shared store like Redis), but focus on the core token-bucket-with-lazy-refill design. Provide a detailed, system-design level answer: data structures, algorithms, complexity, trade-offs between sliding window and token bucket, and how lazy refill works in practice.

Quick Answer: This question evaluates understanding of rate-limiting algorithms and scalability concepts, specifically comparing sliding-window and token-bucket approaches with attention to lazy refill, per-client state modeling, memory efficiency, concurrency, and extendability to distributed systems.

Altruist logo
Altruist
Oct 16, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
8
0

You are asked to design and implement a rate limiter for an HTTP API.

Requirements

  • Limit each client (e.g., user ID or API key) to at most N requests per T seconds.
  • The rate limiter should run in memory (assume a single application instance to start).
  • It must be:
    • Low latency (per-request check must be fast).
    • Memory efficient even when there are many distinct clients and very high traffic.
    • Easy to extend later to a distributed setting.

The interviewer guides you through the following steps:

  1. Baseline design (sliding window):
    • Describe how you would implement a basic sliding-window rate limiter.
    • Be explicit about what data you store per client, how you decide whether to allow or reject a request given its timestamp, and the time and space complexity.
  2. Scalability concern:
    • Explain what happens when traffic becomes extremely large (e.g., millions of clients, each sending many requests per second).
    • Why might the naive sliding-window implementation run into memory and performance issues?
  3. **Improved design (token bucket with lazy refill):
    • Redesign the rate limiter using a token bucket algorithm so that memory usage stays small and per-request operations remain O(1).
    • You should:
      • Describe the conceptual model of a token bucket (capacity, refill rate, token consumption on each request).
      • Specify exactly what you store per client (e.g., current token count, last refill timestamp, etc.).
      • Explain how to perform a lazy refill : instead of refilling tokens every second via a background job or timer, you only recalculate and refill tokens when a request for that client arrives, using the request's timestamp.
      • Show how the lazy refill calculation works for a request arriving at time now , given last_refill_time , current_tokens , bucket capacity , and refill_rate (tokens per second).
    • Discuss how this design solves the large-data, low-memory problem compared to the sliding-window implementation.
  4. Additional considerations:
    • How would you handle concurrency (multiple threads) accessing the same client's rate-limit state?
    • How would you clean up or evict inactive clients so the in-memory data structure does not grow without bound?
    • Briefly mention how you might extend this design to multiple application servers (e.g., by storing state in a shared store like Redis), but focus on the core token-bucket-with-lazy-refill design.

Provide a detailed, system-design level answer: data structures, algorithms, complexity, trade-offs between sliding window and token bucket, and how lazy refill works in practice.

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Altruist•More Software Engineer•Altruist Software Engineer•Altruist System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.