Adobe Transactional Integrity For Shared Documents

What's being tested

Interviewers want to see that you can design and reason about transactional integrity for collaboratively edited, shared documents under latency, availability, and failure constraints. Expect to be evaluated on choosing and justifying a concurrency-control model (e.g., OT vs CRDT vs MVCC), the correctness properties you provide (e.g., linearizability, causal consistency, or eventual consistency), and practical implementation concerns: replication, persistence, conflict resolution, and failure recovery. Adobe cares because shared-document workflows demand low-latency UX plus correct, durable state across devices and offline sessions.

Core knowledge

Operational Transformation (OT) — transforms concurrent operations to preserve intention; requires a total or partially-ordered operation history and a transformation function T(op_a, op_b). OT often needs a central sequencer or strong ordering for correctness.
Conflict-free Replicated Data Types (CRDTs) — algebraic approach that ensures convergence by designing commutative, associative, idempotent operations; works well for peer-to-peer replication and offline edits without central coordination.
Multi-Version Concurrency Control (MVCC) — keeps multiple versions for readers; provides snapshot isolation for transactions but can allow write skew; version-store growth is O(number of versions).
Consistency models — know distinctions: linearizability (single global order), causal consistency (causally related ops ordered), eventual consistency (converges but no ordering guarantees). Choose per UX/latency tradeoff.
Vector clocks & version vectors — track causality with O(C) metadata where C is number of replica actors; practical when C (collaborators) is small; garbage-collect or compress via dotted version vectors for large-scale.
Distributed commit & replication — 2PC-style global transactions give atomicity but block on failures; consensus protocols (Raft, Paxos) preferred for leader-based durable log replication with liveness guarantees under leader election.
Durability & ordering — a write-ahead log (WAL) persisted to disk (or replicated log like Kafka) ensures replayable history; batching persistence improves throughput but increases tail latency.
Idempotency & deduplication — every client op must carry a stable client-generated idempotency key to tolerate retries and at-least-once delivery without duplicate effects.
Sharding & routing — partition documents by document-id; cross-document transactions are expensive: prefer single-document transaction guarantees and compensate for cross-doc consistency via application-level reconciliation.
Latency vs consistency tradeoffs — synchronous replication to majority yields stronger durability and lower data loss, costing p99 latency; asynchronous replication reduces latency but risks data loss on leader failure.
Garbage collection & compaction — for CRDTs or MVCC, plan compaction thresholds (e.g., when ops > 10k or versions > 1000) to bound memory and IO; store checkpoints/snapshots periodically to truncate logs.
Operational size & transformation granularity — character-level ops scale poorly for large documents; use higher-level operations (paragraph/element-based) or CRDTs like RGA/WOOT for text with metadata-size tradeoffs.

Tip: prefer providing per-document strong ordering (leader sequencer + replicated log) and CRDTs for offline-first, peer scenarios — justify based on collaborator count and latency targets.

Worked example — "Design transactional integrity for a real-time collaborative editor"

First 30 seconds: clarify expected guarantees (must edits be linearizable? Is offline editing required? Typical collaborators per document? latency SLOs like 50ms local echo?), and whether cross-document atomicity is needed. Skeleton answer pillars: (1) choose operation model: CRDT (offline resilient) vs OT (lower metadata but needs central ordering), (2) ordering and replication: leader sequencer + Raft-replicated append-only log for per-document operations, (3) persistence and recovery: WAL + periodic snapshots to bound log replay, (4) client synchronization: vector clocks or last-known-log-index with idempotency keys. One explicit tradeoff: picking CRDT simplifies offline merges and avoids a central bottleneck, but increases per-object metadata and may complicate rich structured-document invariants; choosing leader + log gives simpler sequential semantics at cost of write latency and a leader hot-spot. Close by saying: if more time, I’d prototype an op-format, simulate failure scenarios (network partitions, leader failover), and specify compaction checkpoints and metrics (op-latency, convergence time, merge conflicts rate).

A second angle — "Support offline edits and reconcile with server-state while preserving user intent"

Here the constraint shifts: high offline tolerance and eventual convergence matter more than immediate global linearizability. Apply the same concepts: use CRDTs or OT with client-side buffering, carry causal metadata (vector clocks or operation timestamps), and use an anti-entropy sync protocol (gossip or delta-sync) to exchange missed operations. Key differences: accept eventual consistency and design UI conflict indicators; optimize payloads using deltas and tombstone compaction. Also guard against divergent rich-structure invariants (tables, embedded assets) by adding application-level commutative operations or server-side compensating transactions for non-commutative actions.

Common pitfalls

Pitfall: assuming naive pessimistic locking across documents will scale — locking simplifies correctness but produces poor UX, leader hot-spots, and makes offline edits impossible. Prefer per-document ordering or CRDTs.

Pitfall: conflating eventual consistency with correctness — eventual convergence doesn't imply preservation of user intent or strong invariants; be explicit about which invariants are preserved and where compensating actions are needed.

Pitfall: ignoring idempotency and duplicate delivery — without client-generated ids and dedup on the server, retries produce duplicate operations that break document state and user expectations; design idempotent op application and durable acking.

Connections

Interviewers may pivot to adjacent topics: designing the persistence layer (log compaction, snapshot formats), scaling leader election and partition rebalancing, or reasoning about security and access-control (authz when merging offline edits). They might also probe operational concerns: monitoring, SRE runbook for failover, and load-testing conflict rates.