System Design: Real-Time Chat (1:1 and Groups)
Context
Design a mobile-first, globally available real-time chat system that supports both 1:1 and group messaging at large scale (target: 100M daily active users). Assume multi-region deployment, clients are primarily mobile apps with intermittent connectivity, and the system must prioritize low latency and high availability.
Requirements
Cover the following areas explicitly:
-
API design
-
Send, receive/sync, acknowledgments (acks)
-
Message ordering and idempotency semantics
-
Features
-
Read receipts and typing indicators
-
Online presence
-
Architecture
-
Fan-out strategy (write vs. read; hybrid if applicable)
-
Storage tiers (hot vs. cold) and media/attachments handling
-
Indexing and data model
-
Replication and sharding strategy
-
Consistency vs. availability trade-offs
-
Delivery
-
Offline delivery and retries via push notifications
-
Safety
-
Rate limiting and spam/abuse controls
-
End-to-end encryption considerations (1:1 and groups)
-
Scale
-
Back-of-the-envelope capacity estimates for 100M DAU
-
Operations
-
Monitoring, observability, and disaster recovery
Make reasonable assumptions where needed and call them out explicitly.