Design a scalable service and model performance

Q: Design a scalable service and model performance

This is a System Design interview question from Anthropic for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

System Design: Multi-Region, 50k QPS, p95 < 100 ms

Context

Design an online, read-heavy key-value service (for example, a user profile or feature lookup) used by latency-sensitive applications worldwide. Clients connect from multiple continents. The service must be highly available across multiple regions and maintain low tail latency.

Assume small payloads (1–5 KB), id-based access patterns, and that strong read-after-write consistency is required within a region for a session, but cross-region consistency can be eventual.

Requirements

Traffic target: 50k peak QPS (global), p95 latency under 100 ms.
Multi-region, active-active, highly available design with zero single-region dependency.
Include APIs, data model, caching, consistency, partitioning, failure handling, rollout/canary.
Do back-of-the-envelope capacity planning (reads vs writes, growth, peak vs average, instance sizing, egress).
Build a performance model to predict end-to-end latency under load (service time breakdown, queueing approximations such as Little’s Law), and identify bottlenecks.
Propose concrete mitigations and define SLOs, monitoring, and load-testing to validate the model.

Deliverables

API design (CRUD, batch, idempotency, versioning, errors).
Storage schema and indexing; partitioning strategy.
Caching layers and invalidation strategy.
Consistency model (regional and cross-region) and conflict resolution.
Failure handling (zone, region, network partitions, thundering herd) and client resiliency.
Rollout and canary strategies (schema and code).
Capacity planning with numerical estimates: read/write ratio, data growth over 12 months, peak vs average, instance count and size, and network egress.
Performance model with queueing approximations; identify bottlenecks.
Mitigations (e.g., batching, async, indexes, autoscaling, circuit breaking).
SLOs, monitoring, and load-testing plans to validate performance and availability.

Design a scalable service and model performance

System Design: Multi-Region, 50k QPS, p95 < 100 ms

Context

Requirements

Deliverables

Solution (Locked)

Comments (0)