PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/System Design/Meta

Design a Real-Time Trending Hashtags System

Last updated: Jul 1, 2026

Quick Overview

This question evaluates a candidate's ability to design a large-scale, real-time data processing system that continuously scores and ranks trending content under strict freshness constraints. It tests system design skills including stream processing, time-decayed scoring, and handling high write throughput at scale. This type of question is commonly used to assess architectural thinking for systems balancing timeliness, popularity, and novelty signals.

  • medium
  • Meta
  • System Design
  • Software Engineer

Design a Real-Time Trending Hashtags System

Company: Meta

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

Design the backend system that **detects and serves trending hashtags** from posts on a large social network (Facebook-scale). The service continuously ingests posts, each of which may contain zero or more hashtags, and must surface the hashtags that are *trending right now*. A hashtag should be considered a good trend along three dimensions: - **Timeliness** — the trend should surface *while the real event is still happening*. End-to-end latency from a surge in usage to the hashtag appearing in the trending list should be about **1 minute**. The system considers posts from the **last 24 hours**, and more recent posts matter more than older ones. - **Popularity** — the trend should be of genuine interest to *many people* in the community, not just a handful of very active accounts. - **Novelty** — the trend should be about something *new*: people were not posting about it before, or at least not with the same intensity. Your design should cover how posts flow into the system, how a trend score is computed from these three dimensions, and how the resulting trending list is stored and served to readers at scale. ### Constraints & Assumptions These are working assumptions to size the design; in a real interview you would confirm them up front. - Write volume: on the order of a few billion posts/day → roughly **50k–80k posts/sec average**, with peaks of **3–5×** during large events. Assume ~15% of posts carry at least one hashtag and ~2 hashtags each → on the order of **15k–40k hashtag-events/sec**, peaking higher. - Active distinct hashtags in any 24h window: tens of millions, but the vast majority are rare; only a few thousand are realistic trending candidates at any moment. - Freshness SLA: a surging hashtag should appear in the trending list within **~1 minute** (p99) of the surge. - Read path: the trending list (global, and optionally per-region/locale) is read at very high QPS with a target of **< 100 ms** latency. - The trending list can be **eventually consistent** and **approximate** — exact counts are not required. ### Clarifying Questions to Ask - Is the trending list **global only**, or do we also need **per-region / per-language** lists (which multiplies the aggregation fan-out)? - How is "popularity" defined — distinct **authors**, distinct **viewers/engagers**, or raw post count? This drives whether we need distinct-counting infrastructure. - Do we need **exact** counts, or is **approximate** (sketch-based) acceptable for both speed and memory? - What is the read/write ratio and the expected QPS on the *serving* side versus the *ingestion* side? - Is **spam / coordinated-manipulation** resistance in scope (it strongly affects the novelty and popularity definitions)? - How should hashtags be **normalized** (case, Unicode, near-duplicates like `#WorldCup` vs `#World_Cup`) — in scope now or later? ### Part 1 — Ingestion pipeline and high-level architecture Sketch the end-to-end flow from a post being created to a queryable trending list. Identify the major components, the streaming infrastructure, the partitioning/sharding key, and a back-of-the-envelope sizing of the event rate and the per-hashtag state you must hold. ```hint Shape of the system This is a streaming aggregation problem, not a batch one. Think of an append-only event log feeding stateful stream processors that maintain rolling per-hashtag counts — a kappa-style architecture. ``` ```hint Partitioning Choose a partition key that keeps all activity for one hashtag on one processor so counts can be maintained locally. Partitioning by `hashtag` (hash of the normalized tag) is the natural choice; very hot tags may need extra handling. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 2 — Trend scoring: timeliness, popularity, and novelty Define a concrete, computable **trend score** per hashtag that is refreshed at least once per minute over the rolling 24-hour window. Show how each of the three required properties is operationalized and how they combine into a single rankable number. ```hint Recency weighting Don't hold raw events for 24h. Bucket counts per minute (1440 buckets) and apply a **time decay** — either an exponentially weighted sum over the buckets, or a small set of nested sliding windows (e.g. 1-min, 5-min, 1-hour) — so recent activity dominates. ``` ```hint Popularity without counting everything twice "Many people" means **distinct authors**, not raw post count — otherwise one spammer dominates. Counting exact distinct authors per hashtag at this scale is expensive; a cardinality sketch such as **HyperLogLog** per (hashtag, window) gives ~2% error in ~1.5 KB. ``` ```hint Novelty as anomaly detection Novelty = "now is unusually higher than the historical baseline." Compare the current per-minute rate against a slow baseline (e.g. an **EWMA** of recent rates, or the mean/σ over the past hours/days) and score by a **z-score** or ratio. A brand-new tag with a low baseline scores high; a chronically popular tag with steady volume scores low. ``` #### Clarifying Questions for this Part - What **baseline window** defines "before" for novelty — the previous few hours, the same time on previous days, or a long-running EWMA? This changes how a tag that trends *every* weekend is treated. - Should the score be **per-region** so that a locally surging tag isn't drowned out by global volume? #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 3 — Storage, serving, scaling, and fault tolerance Describe how the computed scores become a top-K trending list served at high QPS within the latency budget, and how the system stays correct and available under failures, late/out-of-order events, and abuse. ```hint Read path Readers should never run the aggregation. Each minute, compute the **top-K** per shard, merge into a global (and per-region) top-K, and publish a small materialized list to a fast store (Redis) and/or a CDN. Reads then hit a precomputed, cached list. ``` ```hint Correctness under failure Use a stream processor with **checkpointing** and idempotent/at-least-once processing so a worker restart resumes from the last checkpoint. Use **watermarks** to bound how long to wait for late events before sealing a minute bucket. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### What a Strong Answer Covers ```premium-lock What a Strong Answer Covers ``` ### Follow-up Questions - How would you extend the design to **personalized** or **per-region/per-language** trending without re-aggregating the full firehose for every segment? - How do you **normalize and cluster near-duplicate** hashtags (case, Unicode, `#WorldCup` vs `#world_cup`, common misspellings) so they count as one trend? - A coordinated group floods a brand-new hashtag from many fake accounts to fake "novelty + popularity." What signals and defenses keep it out of the trending list? - How would you **measure trend quality** offline and online (precision of surfaced trends, time-to-surface, A/B metrics) and tune the scoring weights?

Quick Answer: This question evaluates a candidate's ability to design a large-scale, real-time data processing system that continuously scores and ranks trending content under strict freshness constraints. It tests system design skills including stream processing, time-decayed scoring, and handling high write throughput at scale. This type of question is commonly used to assess architectural thinking for systems balancing timeliness, popularity, and novelty signals.

Related Interview Questions

  • Design Top-K, Crawler, and Chess Systems - Meta (hard)
  • Design Search And Web Crawling Systems - Meta (medium)
  • Design an Instagram-Style Social Feed - Meta (medium)
  • Design an Online Game Leaderboard - Meta (hard)
  • Design an On-Demand Delivery Platform - Meta (medium)
|Home/System Design/Meta

Design a Real-Time Trending Hashtags System

Meta logo
Meta
Mar 6, 2026, 12:00 AM
mediumSoftware EngineerOnsiteSystem Design
0
0

Design the backend system that detects and serves trending hashtags from posts on a large social network (Facebook-scale). The service continuously ingests posts, each of which may contain zero or more hashtags, and must surface the hashtags that are trending right now.

A hashtag should be considered a good trend along three dimensions:

  • Timeliness — the trend should surface while the real event is still happening . End-to-end latency from a surge in usage to the hashtag appearing in the trending list should be about 1 minute . The system considers posts from the last 24 hours , and more recent posts matter more than older ones.
  • Popularity — the trend should be of genuine interest to many people in the community, not just a handful of very active accounts.
  • Novelty — the trend should be about something new : people were not posting about it before, or at least not with the same intensity.

Your design should cover how posts flow into the system, how a trend score is computed from these three dimensions, and how the resulting trending list is stored and served to readers at scale.

Constraints & Assumptions

These are working assumptions to size the design; in a real interview you would confirm them up front.

  • Write volume: on the order of a few billion posts/day → roughly 50k–80k posts/sec average , with peaks of 3–5× during large events. Assume ~15% of posts carry at least one hashtag and ~2 hashtags each → on the order of 15k–40k hashtag-events/sec , peaking higher.
  • Active distinct hashtags in any 24h window: tens of millions, but the vast majority are rare; only a few thousand are realistic trending candidates at any moment.
  • Freshness SLA: a surging hashtag should appear in the trending list within ~1 minute (p99) of the surge.
  • Read path: the trending list (global, and optionally per-region/locale) is read at very high QPS with a target of < 100 ms latency.
  • The trending list can be eventually consistent and approximate — exact counts are not required.

Clarifying Questions to Ask

  • Is the trending list global only , or do we also need per-region / per-language lists (which multiplies the aggregation fan-out)?
  • How is "popularity" defined — distinct authors , distinct viewers/engagers , or raw post count? This drives whether we need distinct-counting infrastructure.
  • Do we need exact counts, or is approximate (sketch-based) acceptable for both speed and memory?
  • What is the read/write ratio and the expected QPS on the serving side versus the ingestion side?
  • Is spam / coordinated-manipulation resistance in scope (it strongly affects the novelty and popularity definitions)?
  • How should hashtags be normalized (case, Unicode, near-duplicates like #WorldCup vs #World_Cup ) — in scope now or later?

Part 1 — Ingestion pipeline and high-level architecture

Sketch the end-to-end flow from a post being created to a queryable trending list. Identify the major components, the streaming infrastructure, the partitioning/sharding key, and a back-of-the-envelope sizing of the event rate and the per-hashtag state you must hold.

What This Part Should Cover Premium

Part 2 — Trend scoring: timeliness, popularity, and novelty

Define a concrete, computable trend score per hashtag that is refreshed at least once per minute over the rolling 24-hour window. Show how each of the three required properties is operationalized and how they combine into a single rankable number.

Clarifying Questions for this Part

  • What baseline window defines "before" for novelty — the previous few hours, the same time on previous days, or a long-running EWMA? This changes how a tag that trends every weekend is treated.
  • Should the score be per-region so that a locally surging tag isn't drowned out by global volume?

What This Part Should Cover Premium

Part 3 — Storage, serving, scaling, and fault tolerance

Describe how the computed scores become a top-K trending list served at high QPS within the latency budget, and how the system stays correct and available under failures, late/out-of-order events, and abuse.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

  • How would you extend the design to personalized or per-region/per-language trending without re-aggregating the full firehose for every segment?
  • How do you normalize and cluster near-duplicate hashtags (case, Unicode, #WorldCup vs #world_cup , common misspellings) so they count as one trend?
  • A coordinated group floods a brand-new hashtag from many fake accounts to fake "novelty + popularity." What signals and defenses keep it out of the trending list?
  • How would you measure trend quality offline and online (precision of surfaced trends, time-to-surface, A/B metrics) and tune the scoring weights?

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Meta•More Software Engineer•Meta Software Engineer•Meta System Design•Software Engineer System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.