4.2 Design a Collaborative Document Editor

Q: 4.2 Design a Collaborative Document Editor

This is a System Design interview question from General for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Real-time collaborative editing is one of those features that feels magical when it works — and absolutely nightmarish to build from scratch. Google Docs, Notion, Figma — they all solve variations of the same fundamental problem: how do you let multiple people edit the same thing at the same time without destroying each other's work?

This question is a favorite in system design interviews because it touches distributed systems, conflict resolution algorithms, real-time communication, and storage design all in one problem.

Estimated time: 40 minutes

Step 1: Understand the Requirements (5 minutes)

Functional Requirements

Multiple users can simultaneously edit the same document in real time
Users see each other's changes within 1-2 seconds
Conflict resolution — when two users edit the same section, the system resolves it gracefully
Version history — users can browse and restore previous versions
Presence awareness — see who else is in the document, their cursor positions, and text selections
Offline editing — users can make changes without internet and sync when reconnected

Non-Functional Requirements

Support up to 100 concurrent editors per document
Low latency — edits should appear on other users' screens in under 500ms
Strong eventual consistency — all users must converge to the same document state
No data loss — every keystroke must be durably stored

The most critical non-functional requirement is strong eventual consistency. Two users can temporarily see different states, but they must always converge to the same final document.

Step 2: Estimation (5 minutes)

Peak operations: 2 million sessions x 40 ops/min = 80 million operations per minute. Each operation is small (50-200 bytes) — this is operation-heavy but bandwidth-light. Total document storage: 500M documents x 50 KB = 25 TB.

Step 3: High-Level Design (10 minutes)

Core components: API Gateway, Document Service, Collaboration Service (WebSocket-based OT engine), Presence Service (Redis pub/sub), Version History Service (snapshots + operation log), and Storage Layer (PostgreSQL, Kafka, S3, Redis).

Step 4: Deep Dive (15 minutes)

Conflict Resolution — OT vs CRDT

OT uses a central server to transform concurrent operations. CRDTs use specially designed data structures that converge without coordination. OT is proven at Google scale with lower memory overhead. CRDTs are better for offline-first and peer-to-peer scenarios.

Real-Time Synchronization

Clients apply edits optimistically for zero-latency feel, then reconcile with the server. The Jupiter protocol handles client-server OT synchronization.

Document Storage

Dual model: snapshots (every 100 operations) plus append-only operation log. Enables fast loading, version history, and audit trails.

Presence Awareness

Cursor positions broadcast every 100-200ms via Redis pub/sub. Ephemeral data with 30-second TTL.

Offline Editing

Operations queued locally in IndexedDB. On reconnect, transformed against missed server operations.

Wrap Up

Key trade-offs: OT vs CRDT, snapshot frequency, consistency vs latency, operation log retention. Focus on the core conflict resolution mechanism, nail the sync protocol, and you will be in great shape.