Real-time collaborative editing is one of those features that feels magical when it works — and absolutely nightmarish to build from scratch. Google Docs, Notion, Figma — they all solve variations of the same fundamental problem: how do you let multiple people edit the same thing at the same time without destroying each other's work?
This question is a favorite in system design interviews because it touches distributed systems, conflict resolution algorithms, real-time communication, and storage design all in one problem.
Estimated time: 40 minutes
Step 1: Understand the Requirements (5 minutes)
Functional Requirements
-
Multiple users can simultaneously edit the same document in real time
-
Users see each other's changes within 1-2 seconds
-
Conflict resolution — when two users edit the same section, the system resolves it gracefully
-
Version history — users can browse and restore previous versions
-
Presence awareness — see who else is in the document, their cursor positions, and text selections
-
Offline editing — users can make changes without internet and sync when reconnected
Non-Functional Requirements
-
Support up to 100 concurrent editors per document
-
Low latency — edits should appear on other users' screens in under 500ms
-
Strong eventual consistency — all users must converge to the same document state
-
No data loss — every keystroke must be durably stored
The most critical non-functional requirement is strong eventual consistency. Two users can temporarily see different states, but they must always converge to the same final document.
Step 2: Estimation (5 minutes)
Peak operations: 2 million sessions x 40 ops/min = 80 million operations per minute. Each operation is small (50-200 bytes) — this is operation-heavy but bandwidth-light. Total document storage: 500M documents x 50 KB = 25 TB.
Step 3: High-Level Design (10 minutes)
Core components: API Gateway, Document Service, Collaboration Service (WebSocket-based OT engine), Presence Service (Redis pub/sub), Version History Service (snapshots + operation log), and Storage Layer (PostgreSQL, Kafka, S3, Redis).
Step 4: Deep Dive (15 minutes)
Conflict Resolution — OT vs CRDT
OT uses a central server to transform concurrent operations. CRDTs use specially designed data structures that converge without coordination. OT is proven at Google scale with lower memory overhead. CRDTs are better for offline-first and peer-to-peer scenarios.
Real-Time Synchronization
Clients apply edits optimistically for zero-latency feel, then reconcile with the server. The Jupiter protocol handles client-server OT synchronization.
Document Storage
Dual model: snapshots (every 100 operations) plus append-only operation log. Enables fast loading, version history, and audit trails.
Presence Awareness
Cursor positions broadcast every 100-200ms via Redis pub/sub. Ephemeral data with 30-second TTL.
Offline Editing
Operations queued locally in IndexedDB. On reconnect, transformed against missed server operations.
Wrap Up
Key trade-offs: OT vs CRDT, snapshot frequency, consistency vs latency, operation log retention. Focus on the core conflict resolution mechanism, nail the sync protocol, and you will be in great shape.