Design personalized news aggregation service

Q: Design personalized news aggregation service

The question evaluates the ability to design a large-scale personalized content delivery system, testing competencies in distributed systems architecture, scalable ingestion and crawling, metadata normalization, storage and indexing, personalization and ranking, caching, and availability.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Loading...

System Design: Personalized News Aggregation Service

Design a large-scale news aggregation system similar to Google News or other news aggregator products.

The key functional requirements are:

The system should collect news articles from many different news providers (e.g., CNN, BBC, local newspapers) using:
- Web crawlers (for sites without APIs).
- RSS feeds or publisher APIs when available.
The system should normalize and store collected articles with consistent metadata:
- Title, body, URL, publish time, source, author.
- Category (e.g., politics, sports, tech) and language.
The system should support logged-in users who have:
- Subscriptions to specific publishers/sources.
- Category/topic preferences (e.g., more sports, less politics).
For each logged-in user, the system should display a personalized news feed , taking into account:
- User’s subscriptions.
- User’s category/topic preferences.
- Freshness and popularity of articles.

Non-functional requirements and constraints (you may make reasonable assumptions, but be explicit):

Large scale: potentially tens of millions of daily active users.
High read throughput: most requests are for reading the news feed.
Reasonable freshness: new articles should appear in user feeds within a few minutes of being published.
High availability and low latency for feed retrieval (e.g., p95 < 200–300 ms).

In your design, cover at least the following aspects:

Requirements and APIs
- Clarify functional and non-functional requirements.
- Define main APIs or endpoints for clients (web/mobile) to fetch the news feed and manage preferences.
High-level Architecture
- Major components and services (e.g., crawler, content ingestion pipeline, storage, feed/personalization service).
- How data flows from publishers to the end-user feed.
Data Storage and Indexing
- How you will store articles, metadata, and user preferences.
- How to support efficient querying (by category, recency, popularity, user interests).
Crawling & Ingestion Pipeline
- How crawlers/RSS/API consumers are scheduled and scaled.
- How content is parsed, deduplicated, categorized, and filtered.
Personalization & Ranking
- How to build a personalized feed based on user subscriptions and category preferences.
- Basic ranking logic (you can assume heuristic or ML-based ranking, but describe the approach conceptually).
Scalability, Caching, and Availability
- Strategies to handle high read traffic and keep latency low.
- Use of caching, CDNs, sharding, and replication.
Freshness, Consistency, and Trade-offs
- How to balance freshness of news with system load and cache efficiency.
- Any relevant consistency or CAP-theorem trade-offs you would make.

Explain your design step-by-step and justify key trade-offs.

Design personalized news aggregation service

System Design: Personalized News Aggregation Service

Solution

Comments (0)

Design personalized news aggregation service

Overview

System Design: Personalized News Aggregation Service

Solution

Comments (0)