Design a scalable and highly reliable system to solve an open-ended, complex problem domain (e.g., content feed, ride matching, or file storage). Specify: (a) functional and non-functional requirements (availability, latency SLOs, throughput, consistency, durability); (b) high-level architecture (clients, API gateway, services, data stores, messaging/streaming); (c) core APIs and data models; (d) partitioning/sharding, replication, and consistency strategy (including transactions, idempotency, and schema evolution); (e) caching strategy across client, edge/CDN, and server tiers; (f) load balancing, request routing, and autoscaling; (g) failure handling (timeouts, retries with backoff, circuit breakers), backpressure, and disaster recovery (RPO/RTO, multi-region); (h) observability (metrics, logs, traces), rate limiting, and security (authn/authz, encryption); (i) capacity planning, cost trade-offs, and a phased scaling roadmap; (j) key bottlenecks, risks, and mitigations. Justify design choices and trade-offs throughout.

The prompt evaluates competence in large-scale distributed system design, covering storage architecture, consistency and replication strategies, caching, APIs and data models, operational reliability, security, and cost/capacity planning.

How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Technical Screen rounds at Anthropic.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Anthropic during technical interviews.

Design a scalable, reliable system | Anthropic Interview Question

Context

Design a scalable, highly reliable consumer service where users upload, store, view, and share photos/videos from mobile and web clients. The product supports private storage, shared folders/links, previews (thumbnails, transcodes), search, and versioning. The user base is global with diurnal traffic peaks. Assume a freemium model with both private and publicly shared links.

Task

Specify and justify a production-ready design across the following:

(a) Functional and Non-Functional Requirements

Functional: user onboarding, upload (single and multipart), download/stream, list/browse, share (link- and user-based), versioning, delete/restore, search, thumbnails/transcoding, quotas/billing, audit logs.
Non-Functional: availability, latency SLOs, throughput targets, consistency model (e.g., read-after-write for own uploads), durability, privacy/compliance.

(b) High-Level Architecture

Clients, API gateway, services, data stores, async processing, messaging/streaming, CDN/edge. Include where stateless/stateful boundaries lie.

Define key REST/gRPC APIs (upload init, part upload, complete, get/download, list, share, delete, restore, search).
Sketch essential data models (User, Object, ObjectVersion, Folder, ACL/Share, UploadSession, AuditEvent).

(d) Partitioning, Replication, Consistency

How to shard metadata and objects; replication across AZs/regions; consistency choices for metadata vs object blobs.
Transactions across services, idempotency for retries, schema/version evolution.

(e) Caching Strategy

Client caches (ETag/If-None-Match), edge/CDN (signed URLs, TTLs, invalidation), server-side caches (Redis) for hot metadata.

(f) Load Balancing, Routing, Autoscaling

Global traffic routing, L7 load balancing, service discovery, scaling policies for stateless APIs and workers.

(g) Failure Handling, Backpressure, Disaster Recovery

Timeouts, retries with exponential backoff and jitter, circuit breakers, queue-based backpressure.
DR plan with RPO/RTO and multi-region strategy (active-active or active-passive).

(h) Observability, Rate Limiting, Security

Metrics/logs/traces, SLO monitoring, alerts.
Rate limiting (per-user/IP), abuse detection.
Security: authn/authz, encryption in transit/at rest, KMS, key rotation, secure sharing.

(i) Capacity Planning, Cost, Scaling Roadmap

Estimate storage, QPS, bandwidth. Provide formulas, back-of-envelope numbers, and cost trade-offs (hot vs cold tiers, replication vs erasure coding).
Phased roadmap from MVP to multi-region scale.

(j) Risks and Mitigations

Identify key bottlenecks and failure modes; propose mitigations.

Context

Task

Specify and justify a production-ready design across the following:

(a) Functional and Non-Functional Requirements

Functional: user onboarding, upload (single and multipart), download/stream, list/browse, share (link- and user-based), versioning, delete/restore, search, thumbnails/transcoding, quotas/billing, audit logs.
Non-Functional: availability, latency SLOs, throughput targets, consistency model (e.g., read-after-write for own uploads), durability, privacy/compliance.

(b) High-Level Architecture

Clients, API gateway, services, data stores, async processing, messaging/streaming, CDN/edge. Include where stateless/stateful boundaries lie.

Define key REST/gRPC APIs (upload init, part upload, complete, get/download, list, share, delete, restore, search).
Sketch essential data models (User, Object, ObjectVersion, Folder, ACL/Share, UploadSession, AuditEvent).

(d) Partitioning, Replication, Consistency

How to shard metadata and objects; replication across AZs/regions; consistency choices for metadata vs object blobs.
Transactions across services, idempotency for retries, schema/version evolution.

(e) Caching Strategy

Client caches (ETag/If-None-Match), edge/CDN (signed URLs, TTLs, invalidation), server-side caches (Redis) for hot metadata.

(f) Load Balancing, Routing, Autoscaling

Global traffic routing, L7 load balancing, service discovery, scaling policies for stateless APIs and workers.

(g) Failure Handling, Backpressure, Disaster Recovery

Timeouts, retries with exponential backoff and jitter, circuit breakers, queue-based backpressure.
DR plan with RPO/RTO and multi-region strategy (active-active or active-passive).

(h) Observability, Rate Limiting, Security

Metrics/logs/traces, SLO monitoring, alerts.
Rate limiting (per-user/IP), abuse detection.
Security: authn/authz, encryption in transit/at rest, KMS, key rotation, secure sharing.

(i) Capacity Planning, Cost, Scaling Roadmap

Estimate storage, QPS, bandwidth. Provide formulas, back-of-envelope numbers, and cost trade-offs (hot vs cold tiers, replication vs erasure coding).
Phased roadmap from MVP to multi-region scale.

(j) Risks and Mitigations

Identify key bottlenecks and failure modes; propose mitigations.

Design a scalable, reliable system

Quick Overview

Design a scalable, reliable system

Context

Task

Submit Your Answer to Earn 20XP

Design a scalable, reliable system

Quick Overview

Design a scalable, reliable system

Context

Task

Submit Your Answer to Earn 20XP

Design a scalable, reliable system

Quick Overview