PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/System Design/Anthropic

Design a Distributed Rate Limiter

Last updated: Jul 1, 2026

Quick Overview

This question evaluates a candidate's ability to design distributed systems with globally consistent state across multiple servers. It tests system design skills around atomic shared-state operations, reasoning about clock skew, and planning graceful degradation when a dependency fails. Commonly asked to assess practical, production-level architectural judgment rather than pure algorithmic knowledge.

  • medium
  • Anthropic
  • System Design
  • Software Engineer

Design a Distributed Rate Limiter

Company: Anthropic

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

# Design a Distributed Rate Limiter Design a rate limiter that enforces a per-client request quota (for example, "at most 100 requests per minute per API key") across a **fleet of application servers** sitting behind a load balancer. Any server may receive any client's request, so the limit must be enforced **globally**, not per-server. Cover the limiting algorithm, the shared state, and — critically — how the system behaves correctly despite **clock skew between servers** and how it **degrades when the shared store fails**. ```hint Algorithm choice Think about what data structure lets many servers check-and-increment one client's counter atomically and cheaply, while still expiring old data on its own. ``` ```hint The two hard parts The interesting requirements aren't the happy path — they're (1) clock skew across servers writing timestamps into shared state, and (2) what the limiter does the moment the central store (Redis) is unreachable. ``` ### Constraints & Assumptions - Tens to hundreds of stateless app servers behind a load balancer; a single client's requests can land on any of them. - Limits are per-client (per API key / user / IP) and configurable; default example is `N` requests per rolling window `W`. - A shared low-latency store (e.g. Redis, possibly clustered) holds the counters; the rate-limit check is on the hot path of every request, so it must add minimal latency. - Servers' wall clocks are roughly but not perfectly synchronized (NTP, with drift up to some tens of milliseconds to seconds). - The system must keep serving traffic even if the shared store has a partial or full outage. ### Clarifying Questions to Ask - What exactly is a "client" (API key, user id, IP), and can a single client have multiple limits/tiers? - Should the limit be a hard cap (reject over-limit) or soft (allow with shaping), and is a small amount of over-admission acceptable? - What are the latency and availability budgets for the limiter itself — is "fail open" (allow on store outage) or "fail closed" (reject) preferred? - What window semantics are required: fixed window, sliding window, or token-bucket-style burst allowance? - What is the request volume and the cost of a slightly inaccurate count at the boundaries? ### What a Strong Answer Covers ```premium-lock What a Strong Answer Covers ``` ### Follow-up Questions - Compare fixed-window, sliding-window-log, sliding-window-counter, and token bucket — which would you pick for smoothing bursts and why, and what is the memory cost of each? - With a local per-server fallback of `N / serverCount`, what goes wrong if `serverCount` is stale or servers are added/removed, and how do you keep that share roughly correct? - How do you keep the limiter's own latency low when Redis is healthy — batching, pipelining, local short-TTL caching of "already over limit" verdicts? - How would you support multiple tiers and burst allowances (e.g. 100/min sustained but allow a short burst of 20) in the same mechanism?

Quick Answer: This question evaluates a candidate's ability to design distributed systems with globally consistent state across multiple servers. It tests system design skills around atomic shared-state operations, reasoning about clock skew, and planning graceful degradation when a dependency fails. Commonly asked to assess practical, production-level architectural judgment rather than pure algorithmic knowledge.

Related Interview Questions

  • Design a Prompt Playground - Anthropic (medium)
  • Design Instagram (Feed, Photos, and Friend Recommendations) - Anthropic (medium)
  • Design an LLM Request Batching System - Anthropic (medium)
  • Design a One-on-One Chat Service - Anthropic (medium)
  • Design a prompt playground - Anthropic (hard)
|Home/System Design/Anthropic

Design a Distributed Rate Limiter

Anthropic logo
Anthropic
Jun 23, 2026, 12:00 AM
mediumSoftware EngineerOnsiteSystem Design
0
0

Design a Distributed Rate Limiter

Design a rate limiter that enforces a per-client request quota (for example, "at most 100 requests per minute per API key") across a fleet of application servers sitting behind a load balancer. Any server may receive any client's request, so the limit must be enforced globally, not per-server. Cover the limiting algorithm, the shared state, and — critically — how the system behaves correctly despite clock skew between servers and how it degrades when the shared store fails.

Constraints & Assumptions

  • Tens to hundreds of stateless app servers behind a load balancer; a single client's requests can land on any of them.
  • Limits are per-client (per API key / user / IP) and configurable; default example is N requests per rolling window W .
  • A shared low-latency store (e.g. Redis, possibly clustered) holds the counters; the rate-limit check is on the hot path of every request, so it must add minimal latency.
  • Servers' wall clocks are roughly but not perfectly synchronized (NTP, with drift up to some tens of milliseconds to seconds).
  • The system must keep serving traffic even if the shared store has a partial or full outage.

Clarifying Questions to Ask

  • What exactly is a "client" (API key, user id, IP), and can a single client have multiple limits/tiers?
  • Should the limit be a hard cap (reject over-limit) or soft (allow with shaping), and is a small amount of over-admission acceptable?
  • What are the latency and availability budgets for the limiter itself — is "fail open" (allow on store outage) or "fail closed" (reject) preferred?
  • What window semantics are required: fixed window, sliding window, or token-bucket-style burst allowance?
  • What is the request volume and the cost of a slightly inaccurate count at the boundaries?

What a Strong Answer Covers Premium

Follow-up Questions

  • Compare fixed-window, sliding-window-log, sliding-window-counter, and token bucket — which would you pick for smoothing bursts and why, and what is the memory cost of each?
  • With a local per-server fallback of N / serverCount , what goes wrong if serverCount is stale or servers are added/removed, and how do you keep that share roughly correct?
  • How do you keep the limiter's own latency low when Redis is healthy — batching, pipelining, local short-TTL caching of "already over limit" verdicts?
  • How would you support multiple tiers and burst allowances (e.g. 100/min sustained but allow a short burst of 20) in the same mechanism?

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Anthropic•More Software Engineer•Anthropic Software Engineer•Anthropic System Design•Software Engineer System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.