PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/System Design/Anthropic

Design a scalable, reliable system

Last updated: Mar 29, 2026

Quick Overview

The prompt evaluates competence in large-scale distributed system design, covering storage architecture, consistency and replication strategies, caching, APIs and data models, operational reliability, security, and cost/capacity planning.

  • hard
  • Anthropic
  • System Design
  • Software Engineer

Design a scalable, reliable system

Company: Anthropic

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

Design a scalable and highly reliable system to solve an open-ended, complex problem domain (e.g., content feed, ride matching, or file storage). Specify: (a) functional and non-functional requirements (availability, latency SLOs, throughput, consistency, durability); (b) high-level architecture (clients, API gateway, services, data stores, messaging/streaming); (c) core APIs and data models; (d) partitioning/sharding, replication, and consistency strategy (including transactions, idempotency, and schema evolution); (e) caching strategy across client, edge/CDN, and server tiers; (f) load balancing, request routing, and autoscaling; (g) failure handling (timeouts, retries with backoff, circuit breakers), backpressure, and disaster recovery (RPO/RTO, multi-region); (h) observability (metrics, logs, traces), rate limiting, and security (authn/authz, encryption); (i) capacity planning, cost trade-offs, and a phased scaling roadmap; (j) key bottlenecks, risks, and mitigations. Justify design choices and trade-offs throughout.

Quick Answer: The prompt evaluates competence in large-scale distributed system design, covering storage architecture, consistency and replication strategies, caching, APIs and data models, operational reliability, security, and cost/capacity planning.

Related Interview Questions

  • Design a one-to-one chat system - Anthropic (medium)
  • Design One-to-One Chat - Anthropic (medium)
  • How to stream a large file to 1000 hosts fastest - Anthropic (medium)
  • Design guardrails and fallback for LLM reliability - Anthropic (hard)
  • Design a Crash-Resilient LRU Cache - Anthropic (hard)
Anthropic logo
Anthropic
Sep 6, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
20
0

System Design: Global Photo/Video File Storage and Sharing ("CloudDrive")

Context

Design a scalable, highly reliable consumer service where users upload, store, view, and share photos/videos from mobile and web clients. The product supports private storage, shared folders/links, previews (thumbnails, transcodes), search, and versioning. The user base is global with diurnal traffic peaks. Assume a freemium model with both private and publicly shared links.

Task

Specify and justify a production-ready design across the following:

(a) Functional and Non-Functional Requirements

  • Functional: user onboarding, upload (single and multipart), download/stream, list/browse, share (link- and user-based), versioning, delete/restore, search, thumbnails/transcoding, quotas/billing, audit logs.
  • Non-Functional: availability, latency SLOs, throughput targets, consistency model (e.g., read-after-write for own uploads), durability, privacy/compliance.

(b) High-Level Architecture

  • Clients, API gateway, services, data stores, async processing, messaging/streaming, CDN/edge. Include where stateless/stateful boundaries lie.

(c) Core APIs and Data Models

  • Define key REST/gRPC APIs (upload init, part upload, complete, get/download, list, share, delete, restore, search).
  • Sketch essential data models (User, Object, ObjectVersion, Folder, ACL/Share, UploadSession, AuditEvent).

(d) Partitioning, Replication, Consistency

  • How to shard metadata and objects; replication across AZs/regions; consistency choices for metadata vs object blobs.
  • Transactions across services, idempotency for retries, schema/version evolution.

(e) Caching Strategy

  • Client caches (ETag/If-None-Match), edge/CDN (signed URLs, TTLs, invalidation), server-side caches (Redis) for hot metadata.

(f) Load Balancing, Routing, Autoscaling

  • Global traffic routing, L7 load balancing, service discovery, scaling policies for stateless APIs and workers.

(g) Failure Handling, Backpressure, Disaster Recovery

  • Timeouts, retries with exponential backoff and jitter, circuit breakers, queue-based backpressure.
  • DR plan with RPO/RTO and multi-region strategy (active-active or active-passive).

(h) Observability, Rate Limiting, Security

  • Metrics/logs/traces, SLO monitoring, alerts.
  • Rate limiting (per-user/IP), abuse detection.
  • Security: authn/authz, encryption in transit/at rest, KMS, key rotation, secure sharing.

(i) Capacity Planning, Cost, Scaling Roadmap

  • Estimate storage, QPS, bandwidth. Provide formulas, back-of-envelope numbers, and cost trade-offs (hot vs cold tiers, replication vs erasure coding).
  • Phased roadmap from MVP to multi-region scale.

(j) Risks and Mitigations

  • Identify key bottlenecks and failure modes; propose mitigations.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Anthropic•More Software Engineer•Anthropic Software Engineer•Anthropic System Design•Software Engineer System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.