This System Design question evaluates a candidate's ability to design scalable backend systems for a social app news feed, covering data modeling, API design, real-time updates, ranking, privacy controls, and operational concerns such as sharding, caching, and monitoring.
Design the backend for a social app’s news feed. Define functional and non-functional requirements, data model, and APIs (publish, follow, getFeed). Compare fan-out-on-write vs fan-out-on-read; justify your choice under high write and read loads. Cover ranking, real-time updates, deduplication, privacy/blocks, backfilling, cold-start, caching, storage and indexing, sharding and partitions, consistency/ordering guarantees, failure handling, and monitoring. Provide capacity estimates and a high-level architecture.
Quick Answer: This System Design question evaluates a candidate's ability to design scalable backend systems for a social app news feed, covering data modeling, API design, real-time updates, ranking, privacy controls, and operational concerns such as sharding, caching, and monitoring.
You are designing the backend that powers a mobile social app's home feed (a Twitter/Instagram-style home timeline) at large scale. Each user follows other users, and their home feed is an aggregation of recent posts from the people they follow — surfaced with ranking, real-time updates, and privacy controls rather than a strict reverse-chronological dump.
The system must absorb high write throughput (users publishing posts) and high read throughput (users opening and scrolling feeds) simultaneously, and the follower graph is heavily skewed: most users have a few hundred followers, but a small number of "celebrity" accounts have tens of millions. Your design must work end-to-end across requirements, data model, APIs, the fan-out strategy, and the operational concerns that follow.
Constraints & Assumptions
State your own numbers explicitly, but design against roughly this scale (you may revise after stating assumptions):
Order of
100M daily active users
.
A heavy-tailed follower distribution: mean in the low hundreds, with a long tail of accounts that have
10M+ followers
.
Reads dominate writes, but a single publish can require touching a very large number of follower feeds (write amplification).
Posts may contain text and media references; media itself is large and binary.
getFeed
must be low-latency on the read hot path; new content should become visible within a few seconds for typical authors.
The feed must
never hard-fail
— degraded results are acceptable, errors are not.
Clarifying Questions to Ask
A candidate should scope the problem before designing. Good questions include:
Is this the
home timeline
(posts from people you follow) or a profile/user timeline (a single author's own posts)? They have very different fan-out characteristics.
Is the feed
ranked
(relevance) or strictly
chronological
? This changes ordering and pagination guarantees.
What are the target SLOs for
getFeed
and
publish
latency, and the acceptable freshness lag for new posts?
What visibility models must we support — public, followers-only, private accounts, custom audiences, blocks, mutes?
Is the ML ranking model itself in scope, or only the serving pipeline around it?
What is out of scope (DMs, stories, ads, search, notifications) so I can focus the design?
Part 1 — Requirements and capacity estimation
Define the functional and non-functional requirements (SLOs), then produce a back-of-the-envelope capacity estimate. State every assumption (DAU, posts/user/day, feed opens/day, follower distribution, page size). Derive publish QPS, feed-read QPS, the fan-out write rate, and hot-tier storage for materialized feeds and post metadata. Identify which dimension — bytes or writes — is the true bottleneck.
What This Part Should Cover
Clear split of functional vs non-functional requirements, with concrete, measurable SLOs (latency percentiles, availability, freshness, ordering/consistency).
Explicitly stated, plausible assumptions feeding the math; average
and
peak.
Correct derivation of publish QPS, read QPS, and especially the
fan-out write amplification
, with storage sized for feed entries and post metadata (media in object storage, not the DB).
A conclusion identifying write amplification (not raw byte volume) as the scaling bottleneck.
Part 2 — Data model
Propose a data model for users, the follow graph, posts, the materialized feed, and engagement. Justify your storage-engine choices per entity (relational vs wide-column vs object store vs search index), your primary/partition keys and clustering order, and how you key media.
What This Part Should Cover
Entities for users, follow edges, blocks/mutes, posts, materialized feed entries, and engagement/counters.
A justified storage choice per entity (high-write feed/post data → wide-column; media → object store + CDN; discovery → search index).
Partition and clustering keys that match the dominant access patterns (feed by owner, posts by author + time, edges by both directions).
An explicit account of the denormalization cost (dual-write the two edge directions consistently) and how it is kept in sync.
Part 3 — Core APIs and real-time updates
Specify the core APIs — publish, follow/unfollow, and getFeed with pagination — plus the real-time update mechanism. Define request/response shapes, the publish acknowledgement semantics, the pagination cursor design, and how new posts reach a foregrounded client.
What This Part Should Cover
Publish that
acks after durable enqueue
(fan-out async) with an idempotency key.
A
getFeed
cursor that guarantees
monotonic pagination
(no dupes/skips within a scroll session), with a unique time-sortable tie-breaker.
A real-time approach (WebSocket/SSE) that coalesces per user and degrades to polling / push notification when the channel is down.
Idempotency on all write APIs and event emission for downstream consumers.
Part 4 — Fan-out strategy: on-write vs on-read
Compare fan-out-on-write (push) and fan-out-on-read (pull) across read cost, write cost, hot-key behavior, wasted work, and ranking. Choose an approach that survives both high write and high read load at this scale, and justify it. Address explicitly how celebrities (10M+ followers) are handled and how the strategy adapts under load.
What This Part Should Cover
A correct, balanced comparison table (reads, writes, hot keys, wasted work, ranking) of push vs pull.
A justified
hybrid
: push for normal authors, pull/read-time-merge for celebrities above a follower threshold.
Explicit handling of the celebrity write storm and a per-author orchestrator that can shift strategy dynamically under load.
Why this bounds both worst-case write cost and read latency for the common case.
Part 5 — Ranking, dedup, privacy, backfill, and cold-start
Cover the read-time and event-driven concerns: the candidate-generation → ranking pipeline; deduplication (reshares of the same content, already-seen items); privacy/blocks/mutes enforcement; backfill when a user follows someone new; and cold-start for a brand-new user with zero follows.
What This Part Should Cover
A candidate → filter → score → re-rank → paginate pipeline, with hard filters (block/visibility/mute/deleted/seen) applied
before
scoring.
A ranking signal sketch (recency decay, affinity, content features, social proof) with a deadline-bounded fallback to recency on timeout.
Canonical-id reshare collapsing plus a per-user seen filter, with awareness of Bloom-filter false-positive risk.
Backfill on new follow (and why you do
not
bulk-backfill a celebrity) and a sensible cold-start strategy (trending/global quality + interest onboarding).
Part 6 — Caching, sharding, consistency, failure handling, and monitoring
Cover the operational backbone: caching (hot timelines, metadata, graph; stampede protection); sharding/partitioning (keys, hot-key mitigation, rebalancing); consistency and ordering guarantees (read-your-writes, per-author order, monotonic pagination); failure handling (degradation, replay, backpressure); and monitoring (the SLIs that matter for a feed).
What This Part Should Cover
A multi-tier cache plan (Redis hot timeline, KV metadata/counters, graph) with stampede protection (soft TTL, jitter, coalescing).
Partition keys per entity, hot-key mitigation, and rebalancing without mass remapping.
Precise consistency/ordering claims: eventual cross-region, read-your-writes for the author, monotonic-within-snapshot pagination on a ranked feed.
Across all parts, a strong answer keeps the design coherent end-to-end rather than treating each section in isolation:
A consistent narrative where the
fan-out / write-amplification
decision (Part 4) drives the data model (Part 2), capacity math (Part 1), and operational choices (Part 6) — the read SLOs follow from precomputation, the write SLOs from never synchronously fanning out to a celebrity's followers.
Tradeoffs named, not hidden: push wastes work on inactive followers; ranked feeds complicate strict ordering; the hybrid adds orchestration complexity. Each tradeoff is acknowledged with its mitigation.
Correctness invariants that hold across the design: idempotent, replayable writes; privacy enforced at both write and read; monotonic pagination; and a read path that degrades but never hard-fails.
Appropriate altitude — high-level component/data-flow architecture plus enough concrete detail (keys, cursor contents, thresholds, SLOs) to be credible.
Follow-up Questions
A celebrity with 50M followers posts during a traffic peak. Walk through exactly what happens in your system, and where the load goes.
Two users in different regions follow each other and post nearly simultaneously. What ordering and consistency does each see in the other's feed, and why is that acceptable?
A user reports they "missed" a post from someone they follow. Enumerate every place in your pipeline where a post can be legitimately or accidentally dropped, and how you'd debug it.
How would you A/B test a new ranking model safely without degrading the feed for the holdout, and which metrics would gate the rollout?