PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/System Design/Pinterest

Design a global high-throughput rate limiter

Last updated: Mar 29, 2026

Quick Overview

This question evaluates distributed systems design, scalability, consistency trade-offs, rate-limiting algorithms, data modeling, and operational competencies required to build a global, low-latency multi-region service.

  • hard
  • Pinterest
  • System Design
  • Software Engineer

Design a global high-throughput rate limiter

Company: Pinterest

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Onsite

Design a global, high-throughput rate limiter that enforces per-user and per-API quotas across multiple regions with low latency and burst tolerance. Choose and justify an algorithm (e.g., token bucket vs. leaky bucket), outline the data model, partitioning and sharding strategy, clock/time-window semantics, consistency trade-offs, and handling of hot keys. Describe scaling strategies, capacity planning math (including how many machines are needed to handle a 2× traffic increase), failure modes, backpressure, observability, and APIs for configuration and metrics.

Quick Answer: This question evaluates distributed systems design, scalability, consistency trade-offs, rate-limiting algorithms, data modeling, and operational competencies required to build a global, low-latency multi-region service.

Related Interview Questions

  • Design Catalog Update Pipeline - Pinterest (medium)
  • Design an ads event reporting system - Pinterest (medium)
  • Design autocomplete and merchant bulk edits - Pinterest (medium)
  • Design highly available blob storage service - Pinterest (medium)
  • Design a Google Photos–like service - Pinterest (hard)
Pinterest logo
Pinterest
Sep 6, 2025, 12:00 AM
Software Engineer
Onsite
System Design
5
0

System Design: Global, High-Throughput Rate Limiter

Context

You are designing a global, multi-region rate-limiting service that enforces quotas:

  • Per-user (across all APIs)
  • Per-API (across all users)
  • Per-user-per-API (composite)

The system must be low-latency, tolerate bursts, and operate across multiple regions.

Requirements

Design and justify:

  1. Algorithm choice (e.g., token bucket vs. leaky bucket) with burst tolerance and latency considerations.
  2. Data model for quotas, runtime state, and metrics.
  3. Partitioning and sharding strategy for a global deployment.
  4. Clock/time-window semantics (fixed vs. sliding windows, monotonic time handling).
  5. Consistency trade-offs (global correctness vs. latency) and how you bound overages.
  6. Hot-key handling strategies.
  7. Scaling strategies and capacity planning math, including how many machines are required if traffic doubles (2×), with clear assumptions and formulas.
  8. Failure modes and mitigations.
  9. Backpressure behavior and client guidance.
  10. Observability: metrics, tracing, dashboards, and alerts.
  11. APIs for configuration and metrics.

Assume a mixed workload with both regional and global traffic patterns, and that some users may send traffic from multiple regions concurrently.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Pinterest•More Software Engineer•Pinterest Software Engineer•Pinterest System Design•Software Engineer System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.