System Design: Real-Time Messenger (1:1 and Group Chats)
Context
Design a production-grade real-time messaging system that supports both one-to-one and group conversations. Assume mobile and web clients, multi-device usage, and global scale. Optimize for low latency, reliability, and cost efficiency.
Functional Requirements
-
Conversations
-
1:1 chats and group chats (create, invite, remove, leave, mute, block).
-
Message types: text, emoji, media (images, video, files), and metadata (typing, presence).
-
Read receipts (per-recipient), delivery status, message edits/deletes (soft delete acceptable).
-
Real-time & Offline
-
Real-time messaging with message ordering per conversation.
-
Offline store-and-forward, sync across devices, and resend on reconnect.
-
Presence & Notifications
-
User presence (online/last seen), typing indicators.
-
Push notifications for offline users.
-
Search
-
Search messages and users. Scope results to user’s accessible conversations.
Non-Functional Requirements
-
Scale and Performance
-
Target: 100M MAU, 10M DAU, peak 2M concurrent connections/region.
-
P50 send-to-deliver < 200 ms within a region; P99 < 1 s.
-
Availability: 99.95%.
-
Consistency & Reliability
-
Per-conversation message ordering.
-
Delivery guarantees and idempotency.
-
Consistent state across devices.
-
Security
-
Authentication, authorization, transport encryption; discuss E2E encryption option.
-
Abuse/spam controls.
-
Operations
-
Observability (metrics, logs, traces), rate limiting, backpressure.
-
Cost-aware design and capacity planning.
Deliverables
-
Clarify assumptions and requirements.
-
High-level architecture: storage choices and messaging protocols.
-
APIs and data models.
-
Message ordering and delivery guarantees; read receipts.
-
Presence, push notifications, media handling, and search.
-
Scalability: sharding, caching, queues; consistency across devices.
-
Multi-region deployment, failover, offline/slow network scenarios.
-
Security, observability, and cost trade-offs.