System Design: News Aggregator
Design a news aggregator (similar to a “Top stories”/Google News style product) that ingests articles from many publishers and serves ranked feeds to users.
Core requirements
-
Ingest
articles from thousands of sources (RSS/Atom feeds, publisher APIs, webhooks).
-
Normalize & store
article content/metadata: title, body/snippet, author, publish time, canonical URL, source, topics/tags.
-
De-duplicate
near-identical stories across sources (same event reported by many outlets).
-
Rank & serve
:
-
A
homepage feed
(global ranking).
-
A
topic feed
(e.g., Sports, Tech).
-
Optional:
personalized
feed based on user interests.
-
Low latency reads
for feed browsing;
freshness
matters (new stories appear quickly).
Non-functional requirements (assume typical consumer scale)
-
High availability, multi-region read support.
-
Handle spikes during breaking news.
-
Reasonable content safety (basic spam/malicious source handling).
What to cover
-
APIs (read and ingestion-facing).
-
Data model and storage choices.
-
Ingestion + processing pipeline (parsing, enrichment, dedup).
-
Ranking approach (signals, batch vs real-time).
-
Caching and feed generation strategy.
-
Reliability, backfills, and monitoring.
You may state assumptions (traffic, QPS, data volume) as needed.