Design a high-throughput distributed rate limiting service that supports per-user and global limits with burst tolerance and an approximately sliding window. Target 10M requests/second at peak across multiple regions. Specify the API, choice of algorithms (token bucket, leaky bucket, sliding window), data model, sharding and hot-key mitigation (e.g., consistent hashing, key splitting), storage choices (in-memory vs. Redis vs. custom), replication, and time coordination. Explain how you enforce limits on the request path, handle failures and partial outages, ensure fairness, and provide eventual or strong consistency where needed. Show how you would scale out: capacity planning formulas, how many machines to add for a 2× traffic spike, autoscaling signals, and backpressure/throttling strategies.

This question evaluates distributed systems architecture, rate-limiting algorithms, data modeling, sharding and hot-key mitigation, consistency and fault-tolerance strategies, API design, and operational capacity planning within the System Design domain.

How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Onsite rounds at Pinterest.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Pinterest during technical interviews.

Design a high-throughput distributed rate limiter

System Design: High-Throughput Distributed Rate Limiting Service

Context

You are designing a multi-tenant rate limiting platform for an edge/gateway layer that protects downstream services. The system must enforce both per-user and global limits, tolerate bursts, and approximate a sliding window across multiple regions at peak 10M requests/second.

Requirements

Design a distributed rate limiting service that:

Supports per-user and global limits with burst tolerance and approximately sliding-window semantics.
Handles 10M RPS peak across multiple regions with low latency.
Specifies:
1. Public API (check, configure, introspect) and enforcement integration on the request path.
2. Algorithms (token bucket, leaky bucket, sliding window) and why.
3. Data model for counters/buckets/policies.
4. Sharding and hot-key mitigation (e.g., consistent hashing, key splitting).
5. Storage choices (in-memory vs. Redis vs. custom), replication, and time coordination.
6. Failure handling and partial outages; fairness and consistency (eventual vs strong) where needed.
7. Scale-out plan: capacity planning formulas, machines needed for a 2× traffic spike, autoscaling signals, and backpressure/throttling strategies.

Requirements

Design a distributed rate limiting service that:

Supports per-user and global limits with burst tolerance and approximately sliding-window semantics.

Handles 10M RPS peak across multiple regions with low latency.

Specifies:

Public API (check, configure, introspect) and enforcement integration on the request path.
Algorithms (token bucket, leaky bucket, sliding window) and why.
Data model for counters/buckets/policies.
Sharding and hot-key mitigation (e.g., consistent hashing, key splitting).
Storage choices (in-memory vs. Redis vs. custom), replication, and time coordination.
Failure handling and partial outages; fairness and consistency (eventual vs strong) where needed.
Scale-out plan: capacity planning formulas, machines needed for a 2× traffic spike, autoscaling signals, and backpressure/throttling strategies.

Design a high-throughput distributed rate limiter

Quick Overview

System Design: High-Throughput Distributed Rate Limiting Service

Context

Requirements

Solution

Submit Your Answer

Design a high-throughput distributed rate limiter

Quick Overview

System Design: High-Throughput Distributed Rate Limiting Service

Context

Requirements

Solution

Submit Your Answer