Scalable Distributed System Architecture
Asked of: Software Engineer
Last updated

What's being tested
Interviewers are probing whether you can turn an ambiguous, large-scale product requirement into a scalable distributed architecture with clear APIs, data models, traffic estimates, failure handling, and operational tradeoffs. For TikTok-scale systems, a Software Engineer must reason about latency, availability, consistency, capacity, and cost under global traffic, bursty fanout, and partial failures. Strong answers show you can decompose a system into services, choose storage and messaging primitives intentionally, and explain how the design behaves at p99, not just in the happy path. The interviewer is also checking whether you can communicate architecture clearly: requirements first, then data flow, then bottlenecks, then mitigations.
Core knowledge
-
Requirements framing should separate functional requirements from non-functional requirements. For example: “send push/email/SMS notifications” is functional;
p99 < 500msenqueue latency,99.99%availability, regional data residency, deduplication, and retry semantics are non-functional constraints that shape the architecture. -
Back-of-the-envelope capacity planning grounds every design. Estimate requests per second with and storage with Use peak factors like
5x–20xfor notifications, ticket launches, and viral traffic. -
Service decomposition should follow ownership and scaling boundaries. Typical components include an API gateway, stateless application services, durable storage, cache, message queue, worker fleet, rate limiter, and observability pipeline. Avoid over-splitting early; introduce separate services when scaling, isolation, or deployment cadence requires it.
-
Stateless services behind load balancers scale horizontally and simplify failover. Keep request state in
Redis,MySQL,Postgres,DynamoDB,Cassandra, or object storage rather than local memory. Statelessness enables autoscaling but shifts correctness to storage, cache invalidation, and idempotency. -
Data modeling depends on access patterns.
PostgresorMySQLworks for relational constraints and transactions;DynamoDBorCassandrafits high-write, key-value workloads;Elasticsearchfits search;Redisfits ephemeral counters, locks, and hot reads. Always state primary keys, secondary indexes, and the most common query paths. -
Consistency tradeoffs are central. Strong consistency is needed for inventory, payments, and ticket ownership; eventual consistency is acceptable for feeds, notifications, counters, and analytics-like status updates. Name the failure mode: stale reads, duplicate sends, lost updates, split-brain leadership, or conflicting writes.
-
Idempotency prevents duplicate side effects during retries. Use an
idempotency_key, request hash, or natural unique key such as(user_id, campaign_id, channel)and store a processed record with TTL. This is critical for notification sends, order creation, reservation confirmation, and external provider callbacks. -
Message queues such as
Kafka,Pulsar,RabbitMQ, or cloud queues decouple producers from consumers. They absorb spikes, support retries, and allow worker autoscaling. Discuss delivery semantics: at-most-once may lose messages, at-least-once may duplicate, exactly-once is expensive and often approximated with idempotent consumers. -
Partitioning and sharding control scale and hotspots. Hash by
user_idfor even distribution, byevent_idfor immutable events, or by region for data residency. Beware celebrity users, flash sales, and global announcements; these create hot keys that require fanout batching, token buckets, or precomputed shards. -
Caching improves latency but adds invalidation complexity. Use
CDNfor static assets,Redisfor hot objects and counters, and local in-process caches for small reference data. Define TTLs, cache-aside versus write-through behavior, and how the system behaves on cache miss or cache outage. -
Reliability patterns include timeouts, retries with exponential backoff and jitter, circuit breakers, bulkheads, dead-letter queues, and graceful degradation. Retries must be bounded; otherwise they amplify outages. A good default is short client timeouts, retry only idempotent operations, and send poison messages to a
DLQ. -
Observability must include metrics, logs, and traces. Track
QPS,p50/p95/p99latency, error rate, queue lag, retry count, consumer throughput, saturation, dropped messages, duplicate rate, and provider-specific failures. Tie alerts to user impact and SLOs, not just CPU thresholds.
Worked example
For Design a global notification service, start by clarifying scope in the first 30 seconds: “Are we supporting push only, or also email and SMS? Is the requirement low-latency transactional notifications, bulk marketing campaigns, or both? Do we need regional data residency and user-level opt-out preferences?” Then declare assumptions: multi-tenant service, billions of notifications per day, at-least-once delivery internally, no duplicate visible sends when possible, and regional active-active deployment.
Organize the answer around four pillars: API surface, data model, delivery pipeline, and reliability/operations. The API might include POST /notifications, POST /campaigns, GET /status/{id}, and preference-management endpoints. The data model should include notification templates, user channel tokens, preferences, tenant quotas, send attempts, and deduplication records keyed by idempotency_key. The delivery path should be: API validates and persists request, publishes to Kafka, channel-specific workers consume, apply rate limits and preferences, call external providers such as APNs or FCM, and record delivery status asynchronously.
A key tradeoff to flag is fanout timing: fanout-on-write sends all user notifications immediately and gives low latency, but creates huge bursts; fanout-on-read or staged fanout smooths load but delays delivery and complicates status tracking. For global delivery, explain regional routing: keep user preference data in-region when required, use local queues and workers, and fail over only traffic that is legally and operationally safe to move. Close by saying: “If I had more time, I’d go deeper on provider-specific throttling, tenant isolation, disaster recovery drills, and exactly how SLOs map to alert thresholds.”
A second angle
For Design a high-volume ticketing system, the same architecture vocabulary applies, but the dominant constraint changes from asynchronous fanout to strong consistency under contention. A notification service can usually tolerate duplicate internal processing if the final send is deduped; a ticketing system cannot oversell inventory. The design should emphasize reservation records, short TTL holds, transactional decrement or compare-and-swap, queue-based admission control, and anti-hotspot strategies for popular events. Redis can help with waiting rooms or rate limiting, but the source of truth for ticket ownership needs a durable store with transactional guarantees, such as Postgres, MySQL, or a carefully designed distributed database. The strongest candidates explicitly contrast “smooth bursts with queues” versus “serialize scarce inventory updates.”
Common pitfalls
Pitfall: Jumping straight into boxes and arrows without requirements.
A tempting answer is “I’ll use Kafka, Redis, and microservices” before defining traffic, latency, consistency, or failure expectations. A stronger answer starts with scope, assumptions, core APIs, and capacity estimates, then chooses technologies because they satisfy those constraints.
Pitfall: Treating all data as equally consistent.
Many candidates say “eventual consistency is fine” for everything, which fails for ticket purchases, quota enforcement, and user preference compliance. Call out which operations need strong consistency, which can be eventually consistent, and what user-visible anomaly might occur if the system returns stale data.
Pitfall: Ignoring operational behavior after the initial design.
A design that works only when dependencies are healthy is incomplete. Discuss what happens when Redis is down, a Kafka partition lags, an external push provider throttles, or a region fails; include retries, dead-letter queues, backpressure, dashboards, and manual replay paths.
Connections
Interviewers may pivot from distributed architecture into API design, database indexing and sharding, message queue semantics, rate limiting, or incident debugging. They may also ask you to deep-dive a project you built, so prepare one real architecture where you can explain the original constraints, the bottleneck you hit, the tradeoff you chose, and what you would redesign now.
Further reading
-
Designing Data-Intensive Applications — the best single reference for replication, partitioning, transactions, stream processing, and consistency tradeoffs.
-
Dynamo: Amazon’s Highly Available Key-value Store — foundational paper for quorum reads/writes, consistent hashing, and availability-oriented storage.
-
Google SRE Book — practical treatment of SLOs, error budgets, monitoring, and operating reliable distributed systems.
Featured in interview prep guides
Practice questions
- Design low-latency large-scale hotel booking systemTikTok · Software Engineer · Technical Screen · medium
- Design a high-concurrency ticketing systemTikTok · Software Engineer · Technical Screen · hard
- Design and explain your project architectureTikTok · Software Engineer · Technical Screen · hard
- Explain Your System ArchitectureTikTok · Software Engineer · Onsite · hard
- Design a high-volume ticketing systemTikTok · Software Engineer · Technical Screen · hard
- Design a global notification serviceTikTok · Software Engineer · Technical Screen · hard
- Deep-dive your recent project architectureTikTok · Software Engineer · Technical Screen · hard
Related concepts
- Scalable Backend Architecture And Data ModelingSystem Design
- Scalable Service And Distributed System DesignSystem Design
- Production System Design TradeoffsSystem Design
- Storage, Indexing, APIs, And Secure ExecutionSystem Design
- Distributed Systems Consistency And Low-Latency DesignSystem Design
- Distributed Storage, Replication, and ConsistencySystem Design