Design a News Feed with APIs
Company: Yelp
Role: Data Scientist
Category: Machine Learning
Difficulty: hard
Interview Round: Onsite
Design a personalized news feed system that pushes items to users and also supports pull-based consumption. Requirements: 100M MAU, 1M publishers, 200k writes/s, 2M reads/s, p99 < 200 ms; support follow/mute/blocks, deduplication, diversity and freshness constraints, daily notification caps, and content retractions. Provide external API designs with request/response schemas and idempotency for: publish, subscribe/unsubscribe, get_feed (with pagination and consistent cursors), ack_consume, retract, and feedback logging. Choose between fan-out-on-write vs. fan-out-on-read and justify; describe storage (hot cache, cold store), event streaming, ranking feature pipeline, and online inference. Address abuse/spam controls, multi-region replication, backfill and replay, GDPR deletion, rate limiting, SLOs/observability, and failure modes (e.g., partial outages). Finally, outline how you would run online experiments on ranking while ensuring user-level traffic consistency and safe rollouts.
Quick Answer: This question evaluates system design, scalable API design, and machine-learning ranking competencies for personalized content delivery, focusing on ingestion, candidate generation, ranking, and operational guarantees.