System Design Take‑Home: Large-Scale News Aggregator and Personalized Feed
Context
Design a production-ready news aggregation and personalized feed platform that ingests articles from thousands of publishers, updates users’ feeds in near real time, and supports search, topics, and experimentation at scale.
Functional Requirements
-
Ingest articles from thousands of publishers via RSS, webhooks, and partner APIs.
-
Handle publisher rate limits, retries, transient failures, and deduplication.
-
Ensure idempotent ingestion and content normalization.
-
Support media handling (images/video) and spam/NSFW detection.
-
Near–real-time updates: end-to-end under 5 seconds from publisher to user feed.
-
Provide backfill for missed content/outages.
-
Personalized, ranked feed per user using signals: followed sources, topics, recency, user engagement.
-
Support experimentation (A/B tests, holdouts) for ranking variants and UI.
-
Implement keyword search and topic/tag pages, including geo and language filters.
Data, Storage, and Indexing
-
Design hot vs. cold storage, indexing, and caching layers.
-
Provide core data models and schemas for publishers, articles, users, follows, topics, engagements, experiments, and feeds.
Feed Generation Strategy
-
Describe pull vs. push and fan‑out‑on‑write vs. fan‑out‑on‑read trade‑offs.
-
Choose a strategy (or hybrid) and justify it for scale and latency targets.
Reliability and Availability
-
Multi-region availability with eventual consistency.
-
Disaster recovery plan and RPO/RTO targets.
Capacity and Cost
-
Estimate capacity: QPS, throughput, data volumes.
-
Propose scaling strategies and cost controls.
APIs, Security, and Privacy
-
Define external and internal APIs, rate limiting, authentication/authorization.
-
Address user privacy and compliance (GDPR/CCPA): consent, data deletion/export.
Operations
-
Monitoring, alerting, logging, tracing, and SLOs.
-
Rollback and incident mitigation strategies.
Rollout and Testing
-
Phased rollout plan and end‑to‑end testing strategy (including load, chaos, and experiment guardrails).