Design a Real-Time Messaging System (1:1 and Group)
Context
You are designing a consumer-scale messaging platform supporting both direct and group conversations across multiple user devices. The system must provide low-latency real-time delivery, offline sync, and strong privacy controls while operating at large scale.
-
Scale targets: 100M MAU, 10M DAU
-
Latency SLO: p99 send-to-receive under 150 ms (same-region)
Requirements
-
Features
-
1:1 and group chats
-
Multi-device sync (multiple active devices per user)
-
Presence and typing indicators
-
Read receipts and delivery receipts
-
Media attachments (images/video/docs) with previews
-
Message search
-
Delivery Semantics and Ordering
-
At-least-once delivery with deduplication
-
Message IDs and ordering guarantees per conversation
-
Offline and Sync
-
Offline storage and background sync for clients
-
Server-side backfill and out-of-order handling
-
APIs (define schemas and behavior)
-
SendMessage
-
Ack (delivery/read/typing)
-
Sync (incremental state)
-
FetchHistory (backfill)
-
SubscribePresence (presence/typing)
-
Architecture
-
Edge gateways and real-time fanout (e.g., WebSockets)
-
Message persistence and indexing for search
-
Cold storage tiering
-
Partitioning by user/conversation
-
Non-Functional Concerns
-
Exactly-once UX despite at-least-once transport
-
End-to-end encryption options
-
Abuse/spam mitigation, rate limits/quotas
-
Data retention/TTL and GDPR deletion
-
Observability (metrics, logs, tracing, SLOs)
Describe the end-to-end design and trade-offs. Include key data flows and how you meet latency and scale targets.