Adobe Real-Time Collaboration WebSockets
Asked of: Software Engineer
Last updated

What's being tested
Interviewers probe your ability to design and reason about low-latency, highly concurrent real-time collaboration systems built on WebSocket connections. Expect to demonstrate tradeoffs in state synchronization algorithms, message ordering, scaling architectures, failure/reconnect behavior, and pragmatic engineering (bandwidth, CPU, GC) that a Software Engineer at Adobe would own. You should show both a correct design and the reasoning that balances correctness, latency, and operational cost.
Core knowledge
-
WebSocket basics: full-duplex TCP-based channels established via
Upgradefrom HTTP; keep-alive viaping/pong; fallbacks includeHTTP/2server push,SSE, or long-polling for legacy proxies. -
Connection budget math: estimate required sockets = clients * average concurrent docs; bandwidth ≈ sum(messages/s * avg size) * clients; compute CPU for JSON encode/decode and encryption (TLS). Use this to size TLS offload and message brokers.
-
Presence & routing (rooms/topics): map documents to logical rooms; implement lightweight membership so servers broadcast only to room members; avoid naive all-to-all broadcasts.
-
Sticky sessions vs stateless routing: sticky sessions (session affinity) keeps a client on an owning server, simplifying sequencing; stateless routing requires global sequencing or a shared authoritative owner per document (consistent hashing). Tradeoff: failover complexity vs routing simplicity.
-
State sync algorithms: CRDTs (commutative ops, eventual convergence, client-driven) vs Operational Transformation (OT) (central transform, can maintain intention preservation). CRDTs scale horizontally better; OT can give lower metadata overhead but needs a trusted central transformer.
-
Causality & ordering: use server-assigned monotonic sequence numbers for a document for total order; use Lamport timestamps or per-document vector clocks for causal delivery, but vector clocks scale O(N) — not practical for large audiences.
-
Durability & persistence: persist authoritative operations to durable log (
Kafka/Redis Streams) and periodic snapshots to compact CRDT tombstones; design GC to reclaim metadata with safe checkpoints. -
Inter-server messaging: use a low-latency pub/sub (
Redis,Kafka, or dedicated RPC) to forward ops between frontend WebSocket servers; ensure at-least-once delivery plus idempotency or deduplication keys. -
Backpressure & batching: aggregate small ops into frames and apply rate-limits to avoid head-of-line blocking; use adaptive batching based on latency budget and activity rates.
-
Reconnection & catch-up: client should send last-known sequence ID on reconnect; server replays ops since that ID or sends snapshot delta; support exponential backoff and jitter to avoid thundering herd.
-
Security & auth: authenticate the socket (short-lived JWT) during handshake, enforce ACLs per document; validate and sanity-check applied ops to avoid malformed state causing crashes.
-
Performance hotspots: JSON parsing, large object diffs, GC from long-lived objects, and TLS handshake overhead; consider binary framing (CBOR/protobuf) and connection pooling or TLS session resumption to lower cost.
Worked example — design a scalable WebSocket architecture for collaborative document editing
First 30s framing: clarify scale (clients, concurrent docs, ops/sec), consistency target (strong vs eventual), offline edits, and mobile constraints. Assume 100k concurrent clients, median document size small, target sub-200ms edit propagation. Skeleton pillars: (1) document-ownership: assign each document to an owner server via consistent hashing; (2) sync protocol: use a CRDT (e.g., RGA/LSEQ for text) to allow client-side ops and local immediate application; (3) inter-server pub/sub: owner persists ops to Kafka and publishes deltas to subscribing servers; (4) client reconnect & catch-up: clients include last sequence id and receive replay or snapshot. Key tradeoff to call out: choosing CRDT simplifies multi-master and horizontal scaling but increases metadata and compaction complexity; OT could reduce metadata at cost of central transformation complexity. Close by noting incremental work: implement single-region proof-of-concept, measure op sizes and latency, then add multi-region replication, snapshotting, and tombstone compaction if needed.
A second angle — implement conflict-resolution for a shared JSON document (fine-grained objects)
Framing shift: instead of plain text, the collaborative object is a nested JSON (rich structured data) with higher contention on fields. Apply the same principles but adapt the CRDT type (use a JSON CRDT like Yjs/Automerge or map/register CRDTs) to handle nested structures and concurrent inserts/deletes. You'd prioritize per-key operations with smaller op granularity, maintain causal metadata per-object (not full vector clocks) and use server-assigned sequences for global ordering when necessary for deterministic merges. Constraint differences: larger object graphs increase memory and GC pressure; add targeted snapshotting and compact per-subtree tombstones to control growth.
Common pitfalls
Pitfall: treating WebSockets like stateless HTTP — many designs forget connection lifecycle issues (network flakiness, reconnection, authentication expiry), causing invisible desyncs. Always design explicit handshake, auth refresh, and sequence reconciliation.
Pitfall: assuming global vector clocks scale — vector clocks are correct for causality but cost O(participants). For large audiences, prefer server-assigned sequence numbers or hybrid causality (per-client counters + server sequence).
Pitfall: skipping idempotency and deduplication — using only at-least-once delivery from
Kafka/Redis Streamswithout idempotent ops or dedupe keys leads to duplicated edits and corrupted state. Use operation IDs and idempotent apply semantics.
Connections
Interviewers may pivot to adjacent concerns such as multi-region replication and consistency tradeoffs, storage snapshot/compaction strategies, or load balancer/TLS termination choices (nginx, Envoy). Be ready to map your design to monitoring, observability, and SLOs like p95 propagation latency and connection churn rates.
Further reading
-
RFC 6455 — The WebSocket Protocol — protocol details and handshake semantics.
-
"A comprehensive study of Convergent and Commutative Replicated Data Types" (Shapiro et al.) — formal CRDT definitions and tradeoffs.
-
Neil Fraser, "Differential Synchronization" — practical ideas for syncing document state with diffs and patches.
Related concepts
- Adobe Real-Time Collaboration And WebSockets
- Adobe Real-Time Collaboration Messaging
- Adobe Creative Cloud Real-Time Collaboration And Offline Sync
- Adobe Document Cloud real-time collaboration and offline sync
- Real-Time Messaging And Collaboration SystemsSystem Design
- Real-Time Systems, WebSockets, and Long-Lived Connections