Design a real-time favorites service at scale
Company: Roblox
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Technical Screen
Design a favorites/unfavorites service for items (e.g., posts, products) that shows each item's favorite count in real time. Handle traffic at 1,000,000 QPS reads (count lookups) and 100,000 QPS writes (favorite/unfavorite actions). Specify: APIs and request/response contracts; data model and indexing; consistency model (exact vs. eventual) and target latencies; idempotency/deduplication and exactly-once semantics; counter design (e.g., sharded or aggregated counters) and mitigation of hot keys/skew; caching strategy and cache invalidation; storage choices and partitioning; streaming/batch aggregation for near-real-time counts; multi-region deployment, replication, and failover; handling unfavorite, deletes, and recount/backfill; rate limiting and abuse prevention; security/authorization; observability (metrics, alerts), capacity planning, and cost trade-offs; test strategy and load testing plan.
Quick Answer: This question evaluates a candidate's competency in large-scale distributed system design, covering scalability, low-latency real-time counter aggregation, consistency models, idempotency and deduplication, data partitioning and indexing, caching and hot-key mitigation, streaming and batch aggregation, multi-region replication, security, and observability. It is commonly asked to probe architectural trade-offs for a high-throughput favorites/counts service under heavy read/write skew and strict SLOs, falls under the System Design domain, and primarily tests practical application of architectural patterns with required conceptual understanding of distributed systems and operational concerns.