Adobe Transactional Integrity For Collaborative Edits
Asked of: Software Engineer
Last updated
What's being tested
Interviewers probe your ability to design and reason about transactional integrity in low-latency, real-time collaborative editing systems: correctness under concurrent edits, convergence guarantees, per-operation metadata, persistence and recovery, and practical tradeoffs between strict ACID semantics and user-observable consistency. Adobe cares because collaborative authoring must preserve user intent, minimize visible conflicts, and scale across millions of documents and clients.
Core knowledge
-
Operational Transformation (OT) — an algorithmic family that transforms concurrent edits so applying operations in different orders converges; requires transform functions satisfying TP1/TP2 properties and careful intention preservation.
-
Conflict-free Replicated Data Types (CRDTs) — data structures (state-based or op-based) that guarantee convergence via commutativity; examples: RGA for sequences, LWW-Element-Set for sets, and JSON-CRDTs for nested docs.
-
Causality metadata — per-op fields like
client-id,op-id,logical/lamport timestamp, andvector clock(size O(N) per replica) to order and detect concurrent operations. -
Convergence vs. Real-time latency tradeoff — strong serializability (single-writer or 2PC) gives intuitive consistency but high latency; CRDT/OT offer low-latency local responsiveness with eventual convergence, at cost of more metadata and complex merge semantics.
-
Durability & op-log — persist append-only operation logs (oplog) into systems like
Postgres,Cassandra, or object storage; implement compaction via snapshots and tombstone GC to keep storage manageable. -
Snapshotting & compaction — periodically write a compact document snapshot and store last applied op index; compaction frequency balances restart recovery time vs. write amplification and must avoid losing causality info.
-
Undo/redo and intention preservation — supports inverse operations or history trees; for CRDTs, undo requires causal tombstones or operation inversion tied to
op-idto avoid violating commutativity. -
Client-side optimization — apply optimistic local edits and send ops in background; include idempotency keys and sequence numbers to make retries safe; server must handle out-of-order arrivals.
-
Sharding & routing — partition by document id; keep per-document state on a single leader node when strict ordering needed, or use multi-master replication with CRDTs for availability at scale.
-
Security and authorization — per-op authorization checks and signed ops prevent malicious replay; ensure that permission checks are deterministic when applied during reconciliation.
-
Failure & recovery patterns — store last acknowledged op per-client, use snapshots for fast bootstrapping, and ensure replaying ops is idempotent; test scenarios: partial delivery, duplicated ops, and clock skew.
-
Tip: For documents with rich embedded objects (images, layers), model those as separate CRDT subtrees and use stable references to avoid expensive large-object replication.
Worked example
Design transactional integrity for a collaborative rich-text editor (frame): start by clarifying scope — single document vs. cross-document transactions, offline edits allowed, expected concurrency, and latency SLOs. Skeleton: (1) choose replication primitive (op-based CRDT for low-latency, or centralized sequencer for strict ordering); (2) define per-op metadata (client-id, op-id, lamport-ts, parent-ops); (3) design persistence: append-only oplog, periodic snapshots, compaction and GC; (4) define client protocol: optimistic apply, ack/repair, idempotency. Key tradeoff to call out: using a single sequencer (leader) yields linearizability and simple undo but increases commit latency and single-node bottleneck; CRDTs reduce latency but force you to define deterministic conflict-resolution semantics for rich formatting operations. Close: state additional testing and monitoring you'd add (op-rate, convergences tests, GC safety checks); say "if I had more time, I'd prototype both an op-based CRDT and leader-sequenced approach for benchmark comparison and edge-case fuzzing."
A second angle
Compare OT vs CRDT for collaborative vector-graphics with layered objects: same convergence goal, different constraints. OT requires complex transform functions for non-commutative graphical edits (move, scale, group) and struggles with offline multi-actor edits unless you carry rich context. Op-based CRDTs let you encode operations as commutative primitives (e.g., assign position via unique timestamped anchors) and perform merge without transforms, but you will pay in metadata growth and tombstone management. Here the product constraint (large binary assets, layering semantics) pushes toward hybrid approaches: keep heavy binary content out-of-band (references), use per-layer CRDTs for transforms, and centralize operations that touch multiple layers using a short-lived transaction/lease to preserve intuitive group edits.
Common pitfalls
Pitfall: Designing for perfect serializability — insisting on full ACID across many concurrent users will often force a centralized bottleneck and poor UX; instead, justify where relaxed consistency (eventual/causal) is acceptable and when to enforce ordering.
Pitfall: Ignoring tombstone growth — naive CRDT implementations leave tombstones forever; this causes unbounded storage and slow replicas. Always specify GC, safe retention windows, and background compaction protocols.
Pitfall: Omitting client retry/idempotency semantics — without idempotency keys and sequence numbers, network retries produce duplicate ops and non-deterministic state; define server-side deduplication and stable
op-ids.
Connections
Interviewers may pivot to adjacent topics such as distributed consensus (Raft, Paxos) when you argue for leader sequencing, or MVCC / snapshot isolation when you discuss read semantics and consistent snapshots for long-running operations. They may also ask about monitoring: converged-state tests, op-latency SLOs, and storage compaction metrics.
Further reading
-
[Conflict-free Replicated Data Types (CRDTs) — Shapiro et al., 2011] — foundational formalism and examples for state/op-based CRDTs and convergence proofs.
-
[Concurrency Control in Groupware Systems — Ellis & Gibbs, 1989] — original Operational Transformation discussion and core correctness properties.