Real-time collaborative editing is one of those features that feels magical when it works — and absolutely nightmarish to build from scratch. Google Docs, Notion, Figma — they all solve variations of the same fundamental problem: how do you let multiple people edit the same thing at the same time without destroying each other's work?
This question is a favorite in system design interviews because it touches distributed systems, conflict resolution algorithms, real-time communication, and storage design all in one problem.
Estimated time: 40 minutes
The most critical non-functional requirement is strong eventual consistency. Two users can temporarily see different states, but they must always converge to the same final document.
Peak operations: 2 million sessions x 40 ops/min = 80 million operations per minute. Each operation is small (50-200 bytes) — this is operation-heavy but bandwidth-light. Total document storage: 500M documents x 50 KB = 25 TB.
Core components: API Gateway, Document Service, Collaboration Service (WebSocket-based OT engine), Presence Service (Redis pub/sub), Version History Service (snapshots + operation log), and Storage Layer (PostgreSQL, Kafka, S3, Redis).
OT uses a central server to transform concurrent operations. CRDTs use specially designed data structures that converge without coordination. OT is proven at Google scale with lower memory overhead. CRDTs are better for offline-first and peer-to-peer scenarios.
Clients apply edits optimistically for zero-latency feel, then reconcile with the server. The Jupiter protocol handles client-server OT synchronization.
Dual model: snapshots (every 100 operations) plus append-only operation log. Enables fast loading, version history, and audit trails.
Cursor positions broadcast every 100-200ms via Redis pub/sub. Ephemeral data with 30-second TTL.
Operations queued locally in IndexedDB. On reconnect, transformed against missed server operations.
Key trade-offs: OT vs CRDT, snapshot frequency, consistency vs latency, operation log retention. Focus on the core conflict resolution mechanism, nail the sync protocol, and you will be in great shape.