System Design: Distributed Transactions Across Multiple Services
Context
You must design a distributed transactions protocol to coordinate updates across multiple services or databases in a microservices environment. The network is unreliable (messages can be delayed, duplicated, or lost), services can crash and recover, and latency and availability matter.
Assume services communicate over RPC and/or a message bus. Some services may support a "prepare/commit" primitive; others may not.
Task
-
Propose a protocol to coordinate multi-service updates.
-
Compare and contrast the following approaches:
-
Two-Phase Commit (2PC)
-
Three-Phase Commit (3PC)
-
Saga (orchestration and/or choreography)
-
For each approach, specify:
-
Message flow (step-by-step)
-
Failure handling (coordinator crash, participant crash, message loss/duplication, partition)
-
Idempotency mechanisms
-
Timeouts and retries
-
Delivery guarantees (exactly-once vs at-least-once) and what you can realistically achieve
-
Conclude with a recommendation for when to use each approach and how to implement it safely.