Topic-Based News Subscription Service — Design and Implementation
Context
Build an in-memory service that manages subscribers and delivers relevant news items on demand. Timestamps are integers in the same unit across all operations (assume milliseconds for sliding-window rate limiting; seconds also works if consistent). The service must enforce relevance, freshness, deduplication, ordering, and per-subscriber sliding-window rate limits.
API to Implement (class NewsProvider)
-
AddSubscription(subscriberId, minInterest, maxNewsPerSecond, topics) -> bool
-
Create or update a subscriber’s configuration.
-
Return true on success; false on invalid input or violated constraints.
-
RemoveSubscription(subscriberId) -> bool
-
Remove a subscriber; return false if subscriberId does not exist.
-
NewsReceived(newsId, timestamp, interest, topics) -> bool
-
Ingest a news item with its timestamp, interest score, and topics.
-
Return false if newsId already exists; otherwise true.
-
Publish(nowTimestamp, maxAge) -> dict[int, list[int]]
-
For each subscriber, return the list of newsIds to deliver at nowTimestamp, subject to the rules below.
Publish Rules
-
Eligibility: deliver if news.interest ≥ subscriber.minInterest.
-
Topic filter: deliver if news.topics ∩ subscriber.topics ≠ ∅.
-
Freshness: nowTimestamp − maxAge ≤ news.timestamp ≤ nowTimestamp.
-
Rate limiting: per-subscriber sliding window allowing at most maxNewsPerSecond items in any rolling 1-second window.
-
De-duplication: a given newsId must not be delivered to the same subscriber more than once.
-
Ordering: higher interest first; if ties, older timestamp first; if still tied, smaller newsId first.
Deliverables
-
Specify data structures for: subscriptions, news storage, delivered-tracking per subscriber, and per-subscriber sliding-window counters.
-
Describe algorithms for filtering, rate limiting (sliding window), de-duplication, and multi-key sorting under high throughput.
-
Analyze time and space complexity of each operation and Publish.
-
Discuss concurrency, idempotency, and persistence considerations.
-
Explain how to scale to millions of users and a large, continuous news stream (partitioning, sharding, backpressure, batching, fault tolerance).