Design a simple API rate limiter
Company: Microsoft
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Onsite
Design a **rate limiter** for an API.
### Scenario
You operate an HTTP API (e.g., `/v1/*`). You need to prevent abuse by limiting request rates.
### Requirements (clarify/assume if not provided)
- Enforce limits such as: **N requests per minute** per client (e.g., per userId or per IP).
- Return an appropriate response when a request is rejected (e.g., HTTP `429 Too Many Requests`).
- Low latency (rate check should be fast).
- Should work across **multiple API servers** (distributed setting).
### What to cover
- API/interface and what key you rate-limit on (IP, userId, API key, route, etc.).
- Algorithm choice (fixed window, sliding window, token bucket, leaky bucket) and trade-offs.
- Data storage (in-memory vs Redis/memcache) and how to ensure atomicity.
- Handling bursts, clock skew, failures, and observability (metrics/logging).
Quick Answer: This question evaluates competency in designing scalable, low-latency API rate limiting, covering distributed-systems reasoning, storage and caching trade-offs, algorithmic trade-offs, consistency, and atomicity concerns.