How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a easy difficulty System Design question, commonly asked during Technical Screen rounds at Roblox.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Roblox during technical interviews.

Design Multi-Dimensional Request Rate Limiting

Q: Design Multi-Dimensional Request Rate Limiting

This question evaluates proficiency in designing distributed rate limiting systems, including API and data model design, consistency and scalability trade-offs, and algorithms for enforcing per-key and multi-dimensional limits.

Design a rate limiter for a backend service that runs across multiple application servers. The system must decide, for every incoming request, whether to allow or reject it based on configurable request-rate limits.

This is a two-part problem: first build a standard single-key rate limiter, then extend it to enforce limits across several request fields at once.

Constraints & Assumptions

The service is distributed: many stateless application servers handle requests behind a load balancer, so limit state cannot live only in one process's memory.
Limits are configurable (for example, "100 requests per minute per user", "10 requests per second per IP") and should be changeable without redeploying application code.
Every request must receive a fast allow/deny decision; rate-limit checks sit on the hot path of every API call.
When a request is rejected, the system should be able to report why (which limit was hit) and when the caller can retry.
You may assume access to a fast shared store (such as an in-memory key-value store) for counters; you should justify whatever you choose.

Part 1 — Standard distributed rate limiter

Build a rate limiter that can limit incoming requests by a single key such as user ID, IP address, API key, or endpoint. It must work correctly when multiple application servers handle requests concurrently for the same key.

Address at minimum:

The external/internal API the limiter exposes to the application code.
The algorithm used to track and enforce the limit, and why you chose it.
The data model for limit state (what is stored per key) and where it lives.
How updates stay correct under concurrency across distributed servers.
What the limiter returns to the caller on allow vs. reject .

Part 2 — Multi-dimensional (multi-field) rate limiting

Extend the design so that a single request may carry several fields, and different rate-limit rules may apply to different fields or combinations of fields. For example, one request might include user_id, ip, endpoint, region, and operation, and the system may need to enforce separate limits such as: per user, per IP, per (user, endpoint) pair, and per (region, operation).

Address at minimum:

A data model / rule schema that lets an operator declare which fields a rule keys on, its limit, and its action.
How the limiter matches which rules apply to a given request without scanning every rule on every call.
Key generation when a rule keys on a combination of fields.
The decision logic when multiple limits apply at once — how you combine the per-rule outcomes into one accept/reject, and what you return when several are violated.
The consistency problem created by touching several counters per request, and how you handle it.

Clarifying Questions to Ask

What request volume and per-key cardinality should this handle (requests/sec, number of distinct users/IPs/keys)? This drives the store choice and memory model.
What latency budget does the rate-limit check have on the request path (e.g. single-digit milliseconds)?
How strict must accounting be — is exact enforcement required, or is slight over/under-counting acceptable for abuse prevention?
How many distinct rules and field combinations are expected, and how often do they change?
When the limit store is unavailable, should the system fail open (allow) or fail closed (reject)? Does it differ by endpoint risk?
Are limits global, or per-region/per-tenant? Does a single request need limits enforced across regions?

What a Strong Answer Covers

A clear API contract between application code and the limiter (input context, output decision with retry-after and the violated rule).
A justified algorithm choice with explicit trade-offs (accuracy vs. memory vs. burst handling), not just one named algorithm.
A concrete data model for both single-key state and the multi-field rule schema .
Atomicity / concurrency correctness for the read-refill-decrement update across distributed servers.
A coherent multi-rule decision policy and an honest treatment of the cross-counter consistency problem (atomic multi-key, co-location, or documented approximation).
Scalability mechanics: distributed store, local rule caching, TTLs for key cleanup, hot-key and high-cardinality handling.
Failure handling (fail-open vs. fail-closed, timeouts, fallbacks) and observability (allowed/blocked counts per rule, hot keys, latency) for post-deploy tuning.

Follow-up Questions

How does the design change at 100x request volume, and what breaks first — the counter store, the hot keys, or the rule-matching path?
A single very popular endpoint or a shared NAT IP becomes a hot key . How do you keep that one key from becoming a bottleneck?
How would you support a rule that costs more than one unit per request (weighted/cost-based limiting)?
If you must enforce a global limit (e.g. across regions) with very low latency, how do you reconcile that with the need to avoid a single centralized chokepoint?

This is a two-part problem: first build a standard single-key rate limiter, then extend it to enforce limits across several request fields at once.

Constraints & Assumptions

The service is distributed: many stateless application servers handle requests behind a load balancer, so limit state cannot live only in one process's memory.
Limits are configurable (for example, "100 requests per minute per user", "10 requests per second per IP") and should be changeable without redeploying application code.
Every request must receive a fast allow/deny decision; rate-limit checks sit on the hot path of every API call.
When a request is rejected, the system should be able to report why (which limit was hit) and when the caller can retry.
You may assume access to a fast shared store (such as an in-memory key-value store) for counters; you should justify whatever you choose.

Part 1 — Standard distributed rate limiter

Address at minimum:

The external/internal API the limiter exposes to the application code.
The algorithm used to track and enforce the limit, and why you chose it.
The data model for limit state (what is stored per key) and where it lives.
How updates stay correct under concurrency across distributed servers.
What the limiter returns to the caller on allow vs. reject .

Part 2 — Multi-dimensional (multi-field) rate limiting

Address at minimum:

A data model / rule schema that lets an operator declare which fields a rule keys on, its limit, and its action.
How the limiter matches which rules apply to a given request without scanning every rule on every call.
Key generation when a rule keys on a combination of fields.
The decision logic when multiple limits apply at once — how you combine the per-rule outcomes into one accept/reject, and what you return when several are violated.
The consistency problem created by touching several counters per request, and how you handle it.

Clarifying Questions to Ask

What request volume and per-key cardinality should this handle (requests/sec, number of distinct users/IPs/keys)? This drives the store choice and memory model.
What latency budget does the rate-limit check have on the request path (e.g. single-digit milliseconds)?
How strict must accounting be — is exact enforcement required, or is slight over/under-counting acceptable for abuse prevention?
How many distinct rules and field combinations are expected, and how often do they change?
When the limit store is unavailable, should the system fail open (allow) or fail closed (reject)? Does it differ by endpoint risk?
Are limits global, or per-region/per-tenant? Does a single request need limits enforced across regions?

What a Strong Answer Covers

A clear API contract between application code and the limiter (input context, output decision with retry-after and the violated rule).
A justified algorithm choice with explicit trade-offs (accuracy vs. memory vs. burst handling), not just one named algorithm.
A concrete data model for both single-key state and the multi-field rule schema .
Atomicity / concurrency correctness for the read-refill-decrement update across distributed servers.
A coherent multi-rule decision policy and an honest treatment of the cross-counter consistency problem (atomic multi-key, co-location, or documented approximation).
Scalability mechanics: distributed store, local rule caching, TTLs for key cleanup, hot-key and high-cardinality handling.
Failure handling (fail-open vs. fail-closed, timeouts, fallbacks) and observability (allowed/blocked counts per rule, hot keys, latency) for post-deploy tuning.

Follow-up Questions

How does the design change at 100x request volume, and what breaks first — the counter store, the hot keys, or the rule-matching path?
A single very popular endpoint or a shared NAT IP becomes a hot key . How do you keep that one key from becoming a bottleneck?
How would you support a rule that costs more than one unit per request (weighted/cost-based limiting)?
If you must enforce a global limit (e.g. across regions) with very low latency, how do you reconcile that with the need to avoid a single centralized chokepoint?

Design Multi-Dimensional Request Rate Limiting

Quick Overview

Constraints & Assumptions

Part 1 — Standard distributed rate limiter

Part 2 — Multi-dimensional (multi-field) rate limiting

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Design Multi-Dimensional Request Rate Limiting

Quick Overview

Constraints & Assumptions

Part 1 — Standard distributed rate limiter

Part 2 — Multi-dimensional (multi-field) rate limiting

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP