System Design: Hashtag Metrics Aggregator
Design a service that ingests a high-volume stream of social posts (like tweets) and produces metrics/aggregations for hashtags (similar to an ads click aggregator).
Core use cases
-
Ingest events: each post has
post_id
,
user_id
,
timestamp
,
text
, and derived
hashtags[]
.
-
Query:
-
Get the
count of a given hashtag
over a time range (e.g., last 5 minutes, last 24 hours).
-
Get
top N hashtags
over a sliding window (e.g., top 100 in the last 1 minute).
-
(Optional) Breakdown by dimension such as
country/region
,
language
, or
platform
.
Requirements to clarify
-
Latency SLOs for queries (e.g., p95 < 200 ms).
-
Freshness (e.g., results within 5–10 seconds of real-time).
-
Scale assumptions (events/sec, number of unique hashtags, query QPS).
-
Accuracy expectations:
-
Exactly-once vs at-least-once ingestion.
-
Handling duplicates, late events, out-of-order timestamps.
Deliverables
Describe:
-
High-level architecture.
-
Data model and storage choices (you may start with a simple SQL placeholder, then refine).
-
Stream processing and windowing strategy.
-
APIs.
-
How you handle backfill/reprocessing, failures, and hot hashtags.
-
Caching and concurrency considerations.