Design a Real-Time Trending Hashtags System
Company: Meta
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Onsite
Design the backend system that **detects and serves trending hashtags** from posts on a large social network (Facebook-scale). The service continuously ingests posts, each of which may contain zero or more hashtags, and must surface the hashtags that are *trending right now*.
A hashtag should be considered a good trend along three dimensions:
- **Timeliness** — the trend should surface *while the real event is still happening*. End-to-end latency from a surge in usage to the hashtag appearing in the trending list should be about **1 minute**. The system considers posts from the **last 24 hours**, and more recent posts matter more than older ones.
- **Popularity** — the trend should be of genuine interest to *many people* in the community, not just a handful of very active accounts.
- **Novelty** — the trend should be about something *new*: people were not posting about it before, or at least not with the same intensity.
Your design should cover how posts flow into the system, how a trend score is computed from these three dimensions, and how the resulting trending list is stored and served to readers at scale.
### Constraints & Assumptions
These are working assumptions to size the design; in a real interview you would confirm them up front.
- Write volume: on the order of a few billion posts/day → roughly **50k–80k posts/sec average**, with peaks of **3–5×** during large events. Assume ~15% of posts carry at least one hashtag and ~2 hashtags each → on the order of **15k–40k hashtag-events/sec**, peaking higher.
- Active distinct hashtags in any 24h window: tens of millions, but the vast majority are rare; only a few thousand are realistic trending candidates at any moment.
- Freshness SLA: a surging hashtag should appear in the trending list within **~1 minute** (p99) of the surge.
- Read path: the trending list (global, and optionally per-region/locale) is read at very high QPS with a target of **< 100 ms** latency.
- The trending list can be **eventually consistent** and **approximate** — exact counts are not required.
### Clarifying Questions to Ask
- Is the trending list **global only**, or do we also need **per-region / per-language** lists (which multiplies the aggregation fan-out)?
- How is "popularity" defined — distinct **authors**, distinct **viewers/engagers**, or raw post count? This drives whether we need distinct-counting infrastructure.
- Do we need **exact** counts, or is **approximate** (sketch-based) acceptable for both speed and memory?
- What is the read/write ratio and the expected QPS on the *serving* side versus the *ingestion* side?
- Is **spam / coordinated-manipulation** resistance in scope (it strongly affects the novelty and popularity definitions)?
- How should hashtags be **normalized** (case, Unicode, near-duplicates like `#WorldCup` vs `#World_Cup`) — in scope now or later?
### Part 1 — Ingestion pipeline and high-level architecture
Sketch the end-to-end flow from a post being created to a queryable trending list. Identify the major components, the streaming infrastructure, the partitioning/sharding key, and a back-of-the-envelope sizing of the event rate and the per-hashtag state you must hold.
```hint Shape of the system
This is a streaming aggregation problem, not a batch one. Think of an append-only event log feeding stateful stream processors that maintain rolling per-hashtag counts — a kappa-style architecture.
```
```hint Partitioning
Choose a partition key that keeps all activity for one hashtag on one processor so counts can be maintained locally. Partitioning by `hashtag` (hash of the normalized tag) is the natural choice; very hot tags may need extra handling.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 2 — Trend scoring: timeliness, popularity, and novelty
Define a concrete, computable **trend score** per hashtag that is refreshed at least once per minute over the rolling 24-hour window. Show how each of the three required properties is operationalized and how they combine into a single rankable number.
```hint Recency weighting
Don't hold raw events for 24h. Bucket counts per minute (1440 buckets) and apply a **time decay** — either an exponentially weighted sum over the buckets, or a small set of nested sliding windows (e.g. 1-min, 5-min, 1-hour) — so recent activity dominates.
```
```hint Popularity without counting everything twice
"Many people" means **distinct authors**, not raw post count — otherwise one spammer dominates. Counting exact distinct authors per hashtag at this scale is expensive; a cardinality sketch such as **HyperLogLog** per (hashtag, window) gives ~2% error in ~1.5 KB.
```
```hint Novelty as anomaly detection
Novelty = "now is unusually higher than the historical baseline." Compare the current per-minute rate against a slow baseline (e.g. an **EWMA** of recent rates, or the mean/σ over the past hours/days) and score by a **z-score** or ratio. A brand-new tag with a low baseline scores high; a chronically popular tag with steady volume scores low.
```
#### Clarifying Questions for this Part
- What **baseline window** defines "before" for novelty — the previous few hours, the same time on previous days, or a long-running EWMA? This changes how a tag that trends *every* weekend is treated.
- Should the score be **per-region** so that a locally surging tag isn't drowned out by global volume?
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 3 — Storage, serving, scaling, and fault tolerance
Describe how the computed scores become a top-K trending list served at high QPS within the latency budget, and how the system stays correct and available under failures, late/out-of-order events, and abuse.
```hint Read path
Readers should never run the aggregation. Each minute, compute the **top-K** per shard, merge into a global (and per-region) top-K, and publish a small materialized list to a fast store (Redis) and/or a CDN. Reads then hit a precomputed, cached list.
```
```hint Correctness under failure
Use a stream processor with **checkpointing** and idempotent/at-least-once processing so a worker restart resumes from the last checkpoint. Use **watermarks** to bound how long to wait for late events before sealing a minute bucket.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- How would you extend the design to **personalized** or **per-region/per-language** trending without re-aggregating the full firehose for every segment?
- How do you **normalize and cluster near-duplicate** hashtags (case, Unicode, `#WorldCup` vs `#world_cup`, common misspellings) so they count as one trend?
- A coordinated group floods a brand-new hashtag from many fake accounts to fake "novelty + popularity." What signals and defenses keep it out of the trending list?
- How would you **measure trend quality** offline and online (precision of surfaced trends, time-to-surface, A/B metrics) and tune the scoring weights?
Quick Answer: This question evaluates a candidate's ability to design a large-scale, real-time data processing system that continuously scores and ranks trending content under strict freshness constraints. It tests system design skills including stream processing, time-decayed scoring, and handling high write throughput at scale. This type of question is commonly used to assess architectural thinking for systems balancing timeliness, popularity, and novelty signals.