Design streaming mention analytics with search and alerts

Q: Design streaming mention analytics with search and alerts

This question evaluates system-design competencies such as real-time stream ingestion, stateful stream processing and time-windowed aggregation, high-throughput search indexing, alerting and subscription mechanisms, storage and retention strategy, and trade-offs around scalability, latency, consistency, and cost.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Q: What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Onsite rounds at Bloomberg.

Q: What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Bloomberg during technical interviews.

Question

Loading...

Scenario

You ingest a real-time external stream of social-media posts and news articles. Each item contains raw text and metadata (timestamp, source, author/site, etc.). The product tracks companies/stocks ("entities") and shows:

Mention analytics : how many times each entity was mentioned over time (similar to impressions/mentions).
Charts by time window : users can choose time spans from 30 minutes to multiple days (tumbling or sliding windows are both acceptable).
Latency : charts may be delayed by 10–30 minutes , but data must be aggregated before display.
Subscriptions & notifications : users can follow a set of entities, filter analytics to followed entities, and configure alerts (e.g., spike in mentions).
Search :
- Users can search across hundreds of thousands of entities (by company/stock name).
- Users can also search for the underlying documents (posts/articles) that mention entities.
- Search supports any number of keywords and filtering (e.g., entity, time range, source).
- Search load can be very high (e.g., ~100k RPS ).
Spiky traffic : must handle extreme bursts (breaking news, meme-stock events).
Storage choices : decide how to store both processed/aggregated data and raw documents .

Task

Design a high-level system (APIs, data flow, storage, and scaling strategy) that satisfies the above requirements. Clearly explain:

How raw streaming data is ingested, processed, and aggregated.
How time-windowed analytics are computed and served.
How document/entity search works at high QPS.
How subscriptions and alerting are implemented.
How the system remains reliable and cost-effective under spikes.

State assumptions and key trade-offs (e.g., consistency, latency, storage format, retention).

Design streaming mention analytics with search and alerts

Quick Overview

Scenario

Task

Solution

Comments (0)