Design a large-scale news aggregation feed similar to a major news app.
The system ingests a large volume of raw news articles from many sources, including publishers, RSS feeds, and web crawlers. Users should see a timeline-style feed of news stories.
The main challenge is news aggregation: many publishers may report on the same real-world event, such as a new phone launch. Instead of showing dozens of nearly identical articles from different outlets as separate feed items, the system should group related articles into a single story cluster. The feed should show one entry per story cluster, and when a user opens that entry, they can see multiple source versions of the same story.
If time permits, also discuss personalization: build a user interest profile, prioritize topics and sources, and rank story clusters using freshness, relevance, and source diversity.
Please cover requirements, APIs, data model, ingestion pipeline, clustering approach, ranking and personalization, scalability, reliability, and monitoring.