Design a Twitter hashtag metrics aggregator
Company: Google
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Onsite
## System Design: Hashtag Metrics Aggregator
Design a service that ingests a high-volume stream of social posts (like tweets) and produces **metrics/aggregations for hashtags** (similar to an ads click aggregator).
### Core use cases
- Ingest events: each post has `post_id`, `user_id`, `timestamp`, `text`, and derived `hashtags[]`.
- Query:
- Get the **count of a given hashtag** over a time range (e.g., last 5 minutes, last 24 hours).
- Get **top N hashtags** over a sliding window (e.g., top 100 in the last 1 minute).
- (Optional) Breakdown by dimension such as `country/region`, `language`, or `platform`.
### Requirements to clarify
- Latency SLOs for queries (e.g., p95 < 200 ms).
- Freshness (e.g., results within 5–10 seconds of real-time).
- Scale assumptions (events/sec, number of unique hashtags, query QPS).
- Accuracy expectations:
- Exactly-once vs at-least-once ingestion.
- Handling duplicates, late events, out-of-order timestamps.
### Deliverables
Describe:
- High-level architecture.
- Data model and storage choices (you may start with a simple SQL placeholder, then refine).
- Stream processing and windowing strategy.
- APIs.
- How you handle backfill/reprocessing, failures, and hot hashtags.
- Caching and concurrency considerations.
Quick Answer: This question evaluates competency in designing scalable, low-latency real-time stream processing and aggregation systems, covering concepts such as windowing and stateful processing, data modeling, fault tolerance, hot-key handling, and API/query performance.