PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Rippling

Design personalized news aggregation service

Last updated: Mar 29, 2026

Quick Overview

The question evaluates the ability to design a large-scale personalized content delivery system, testing competencies in distributed systems architecture, scalable ingestion and crawling, metadata normalization, storage and indexing, personalization and ranking, caching, and availability.

  • medium
  • Rippling
  • System Design
  • Software Engineer

Design personalized news aggregation service

Company: Rippling

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Technical Screen

# System Design: Personalized News Aggregation Service Design a large-scale news aggregation system similar to Google News or other news aggregator products. The key functional requirements are: - The system should **collect news articles** from many different news providers (e.g., CNN, BBC, local newspapers) using: - Web crawlers (for sites without APIs). - RSS feeds or publisher APIs when available. - The system should **normalize and store** collected articles with consistent metadata: - Title, body, URL, publish time, source, author. - Category (e.g., politics, sports, tech) and language. - The system should support **logged-in users** who have: - Subscriptions to specific publishers/sources. - Category/topic preferences (e.g., more sports, less politics). - For each logged-in user, the system should display a **personalized news feed**, taking into account: - User’s subscriptions. - User’s category/topic preferences. - Freshness and popularity of articles. Non-functional requirements and constraints (you may make reasonable assumptions, but be explicit): - Large scale: potentially tens of millions of daily active users. - High read throughput: most requests are for reading the news feed. - Reasonable freshness: new articles should appear in user feeds within a few minutes of being published. - High availability and low latency for feed retrieval (e.g., p95 < 200–300 ms). In your design, cover at least the following aspects: 1. **Requirements and APIs** - Clarify functional and non-functional requirements. - Define main APIs or endpoints for clients (web/mobile) to fetch the news feed and manage preferences. 2. **High-level Architecture** - Major components and services (e.g., crawler, content ingestion pipeline, storage, feed/personalization service). - How data flows from publishers to the end-user feed. 3. **Data Storage and Indexing** - How you will store articles, metadata, and user preferences. - How to support efficient querying (by category, recency, popularity, user interests). 4. **Crawling & Ingestion Pipeline** - How crawlers/RSS/API consumers are scheduled and scaled. - How content is parsed, deduplicated, categorized, and filtered. 5. **Personalization & Ranking** - How to build a personalized feed based on user subscriptions and category preferences. - Basic ranking logic (you can assume heuristic or ML-based ranking, but describe the approach conceptually). 6. **Scalability, Caching, and Availability** - Strategies to handle high read traffic and keep latency low. - Use of caching, CDNs, sharding, and replication. 7. **Freshness, Consistency, and Trade-offs** - How to balance freshness of news with system load and cache efficiency. - Any relevant consistency or CAP-theorem trade-offs you would make. Explain your design step-by-step and justify key trade-offs.

Quick Answer: The question evaluates the ability to design a large-scale personalized content delivery system, testing competencies in distributed systems architecture, scalable ingestion and crawling, metadata normalization, storage and indexing, personalization and ranking, caching, and availability.

Related Interview Questions

  • Prevent Duplicate Payments Under High Load - Rippling
  • Design a personalized news aggregator - Rippling (medium)
  • Design a Scalable News Feed - Rippling (medium)
  • Design Scalable Expense Violation Processing - Rippling (hard)
  • Design several large-scale systems - Rippling (hard)
Rippling logo
Rippling
Oct 4, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
6
0
Loading...

System Design: Personalized News Aggregation Service

Design a large-scale news aggregation system similar to Google News or other news aggregator products.

The key functional requirements are:

  • The system should collect news articles from many different news providers (e.g., CNN, BBC, local newspapers) using:
    • Web crawlers (for sites without APIs).
    • RSS feeds or publisher APIs when available.
  • The system should normalize and store collected articles with consistent metadata:
    • Title, body, URL, publish time, source, author.
    • Category (e.g., politics, sports, tech) and language.
  • The system should support logged-in users who have:
    • Subscriptions to specific publishers/sources.
    • Category/topic preferences (e.g., more sports, less politics).
  • For each logged-in user, the system should display a personalized news feed , taking into account:
    • User’s subscriptions.
    • User’s category/topic preferences.
    • Freshness and popularity of articles.

Non-functional requirements and constraints (you may make reasonable assumptions, but be explicit):

  • Large scale: potentially tens of millions of daily active users.
  • High read throughput: most requests are for reading the news feed.
  • Reasonable freshness: new articles should appear in user feeds within a few minutes of being published.
  • High availability and low latency for feed retrieval (e.g., p95 < 200–300 ms).

In your design, cover at least the following aspects:

  1. Requirements and APIs
    • Clarify functional and non-functional requirements.
    • Define main APIs or endpoints for clients (web/mobile) to fetch the news feed and manage preferences.
  2. High-level Architecture
    • Major components and services (e.g., crawler, content ingestion pipeline, storage, feed/personalization service).
    • How data flows from publishers to the end-user feed.
  3. Data Storage and Indexing
    • How you will store articles, metadata, and user preferences.
    • How to support efficient querying (by category, recency, popularity, user interests).
  4. Crawling & Ingestion Pipeline
    • How crawlers/RSS/API consumers are scheduled and scaled.
    • How content is parsed, deduplicated, categorized, and filtered.
  5. Personalization & Ranking
    • How to build a personalized feed based on user subscriptions and category preferences.
    • Basic ranking logic (you can assume heuristic or ML-based ranking, but describe the approach conceptually).
  6. Scalability, Caching, and Availability
    • Strategies to handle high read traffic and keep latency low.
    • Use of caching, CDNs, sharding, and replication.
  7. Freshness, Consistency, and Trade-offs
    • How to balance freshness of news with system load and cache efficiency.
    • Any relevant consistency or CAP-theorem trade-offs you would make.

Explain your design step-by-step and justify key trade-offs.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Rippling•More Software Engineer•Rippling Software Engineer•Rippling System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.