Design a news aggregator system
Company: Rippling
Role: Software Engineer
Category: System Design
Difficulty: easy
Interview Round: Technical Screen
## System Design: News Aggregator
Design a **news aggregator** (similar to a “Top stories”/Google News style product) that ingests articles from many publishers and serves ranked feeds to users.
### Core requirements
- **Ingest** articles from thousands of sources (RSS/Atom feeds, publisher APIs, webhooks).
- **Normalize & store** article content/metadata: title, body/snippet, author, publish time, canonical URL, source, topics/tags.
- **De-duplicate** near-identical stories across sources (same event reported by many outlets).
- **Rank & serve**:
- A **homepage feed** (global ranking).
- A **topic feed** (e.g., Sports, Tech).
- Optional: **personalized** feed based on user interests.
- **Low latency reads** for feed browsing; **freshness** matters (new stories appear quickly).
### Non-functional requirements (assume typical consumer scale)
- High availability, multi-region read support.
- Handle spikes during breaking news.
- Reasonable content safety (basic spam/malicious source handling).
### What to cover
1. APIs (read and ingestion-facing).
2. Data model and storage choices.
3. Ingestion + processing pipeline (parsing, enrichment, dedup).
4. Ranking approach (signals, batch vs real-time).
5. Caching and feed generation strategy.
6. Reliability, backfills, and monitoring.
You may state assumptions (traffic, QPS, data volume) as needed.
Quick Answer: This question evaluates system design and distributed-systems competencies—specifically scalable ingestion, data modeling, deduplication, ranking, and low-latency feed serving—and is categorized under System Design.