Design: Real-Time Favorites/Unfavorites Service at High Scale
Context
Design a backend service that lets users favorite/unfavorite items (e.g., posts, products) and exposes each item's favorite count in near real time. The system must support very high traffic with low latency and strong reliability. Assume clients are web/mobile, traffic is global, and items can be extremely skewed in popularity.
Functional Requirements
-
Users can favorite or unfavorite an item exactly once (per user-item pair).
-
Show favorite counts on item pages in near real time.
-
Provide bulk count lookups for feeds/lists.
-
Idempotent APIs resilient to retries and network issues.
-
Support item deletion and recount/backfill.
Non-Functional Requirements
-
Scale: ~1,000,000 QPS reads (count lookups), ~100,000 QPS writes (favorite/unfavorite).
-
Latency targets: reads p50 ≤ 10 ms, p95 ≤ 20 ms, p99 ≤ 50 ms (from edge); writes p95 ≤ 50 ms to accept and reflect within ≤ 1–2 s globally.
-
Availability: ≥ 99.99% for reads; ≥ 99.9% for writes.
-
Consistency: event-driven eventual consistency for counts (≤ 1–2 s). Strong per-user semantics (cannot favorite twice; unfavorite is a no-op if not favorited).
Specify
-
APIs and request/response contracts.
-
Data model and indexing.
-
Consistency model and latency SLOs.
-
Idempotency/dedup and exactly-once semantics.
-
Counter design (sharded/aggregated) and hot-key mitigation.
-
Caching and invalidation.
-
Storage choices and partitioning.
-
Streaming/batch aggregation for near-real-time counts.
-
Multi-region deployment, replication, and failover.
-
Handling unfavorite, deletes, recount/backfill.
-
Rate limiting and abuse prevention.
-
Security and authorization.
-
Observability (metrics, alerts), capacity planning, and cost trade-offs.
-
Test strategy and load testing plan.