Two-phase commit and its problems

The previous lessons in Module 2 worked through the fallacies of distributed computing, the CAP theorem, and the various flavours of consistency you can ask a distributed data store to give you. Each of them had the same uncomfortable shape: the moment you stop having one machine, properties you used to take for granted stop being free.

Atomic, multi-record transactions are the most painful example. A single Postgres instance gives you, for free, a guarantee that either both rows update or neither does. The application code does not have to think about it. BEGIN, do the work, COMMIT, and the database holds the line. The moment your write touches two databases, or two services, or even just two shards of the same logical store, that guarantee is gone. You either rebuild it yourself, in user space, or you accept that the world is sometimes going to be half-updated for a while.

For most of the 1990s and 2000s the textbook answer was a protocol called two-phase commit, usually shortened to 2PC. It is a beautiful idea on paper. In practice it has a failure mode bad enough that most modern microservice architectures avoid it entirely and reach for the Saga pattern instead. This lesson is about why.

The problem 2PC was invented to solve

You have a transaction that touches two participants. The participants might be two databases (a Postgres and an Oracle, in classic enterprise integration scenarios), two services that each own their own database, or two shards of the same logical store managed by different processes. You want the same end-state property a single-database transaction gives you: either the change happens at both participants, or it happens at neither. No half-states. No “money left account A but never arrived at account B.”

The naive approach does not work. If you write to participant A first and then to participant B, and B is down or rejects the write, you are stuck with a successful write at A that you now have to roll back. If your “rollback” message to A also fails, you are now in a worse state than if you had never started. Each retry can fail in new and creative ways. There is no point at which you can say “we are done” with confidence.

2PC promises to fix this by introducing a third party, a coordinator, whose only job is to make the commit decision after every participant has agreed it can. The coordinator is the source of truth about whether the global transaction succeeded.

The protocol in plain words

Two phases, hence the name. The vocabulary is “prepare” for phase one and “commit or abort” for phase two.

Phase 1, prepare. The coordinator asks every participant the same question: “can you commit this transaction?” Each participant does whatever local work it needs to be able to answer (acquire locks, validate constraints, allocate resources) and then writes a durable answer to its own log. The answer is either “yes, I am ready to commit and I have written that promise to disk” or “no, I cannot commit this transaction.” The participant tells the coordinator. Critically, a participant that has answered yes is now bound: it has promised that if asked to commit, it will be able to. It cannot back out. It cannot crash and forget. The yes is on disk.

Phase 2, commit or abort. The coordinator collects all the answers. If every participant said yes, the coordinator writes “decision: commit” to its own log and then sends commit messages to every participant. Each participant commits its local transaction and acknowledges. If even one participant said no, or did not answer in time, the coordinator writes “decision: abort” and sends abort messages instead. Each participant rolls back.

sequenceDiagram
    participant C as Coordinator
    participant A as Participant A
    participant B as Participant B
    Note over C,B: Phase 1 (prepare)
    C->>A: prepare?
    C->>B: prepare?
    A->>A: lock, validate, log "yes"
    B->>B: lock, validate, log "yes"
    A-->>C: yes
    B-->>C: yes
    Note over C,B: Phase 2 (commit)
    C->>C: log "decision: commit"
    C->>A: commit
    C->>B: commit
    A-->>C: ack
    B-->>C: ack

That is the happy path. Read it twice. It is short, it is symmetric, and it almost works.

The coordinator-failure problem

Here is the famous flaw, and it is the reason 2PC has the reputation it has.

Imagine the coordinator has finished phase 1. Every participant has said yes. Every participant is now holding locks, in the prepared state, waiting for the next message. The coordinator writes “decision: commit” to its log and is about to send the commit messages. At that exact moment, the coordinator’s machine power-cycles.

What can each participant do? Nothing useful. They cannot commit, because they have not been told to. They cannot abort, because they have promised to be able to commit. They are in doubt. The locks they acquired in phase 1 are still held. Any other transaction that wants those rows is blocked. Any read that conflicts with those locks is blocked. The participants will sit there, holding open transactions, locking out the world, until the coordinator comes back and replays its decision log.

In a textbook this sounds tolerable. In a production incident it is catastrophic. Coordinators come back when humans page humans, and humans take time. Meanwhile, the participants’ locks are not metaphorical. Real customer-facing queries are blocked. Real workers are stuck in their connection pools waiting for those locks. The system around the in-doubt transaction degrades, then degrades faster, then collapses.

This is not a hypothetical. Every engineer who has run XA transactions in anger has a story about a coordinator that disappeared and left a participant database with hour-old prepared transactions wedged in its lock manager. The recovery procedure usually involves a database administrator and a manual command to force-resolve the stuck transaction one way or the other. Sometimes the right answer is “commit it and apologise.” Sometimes it is “abort it and explain to finance later.” There is no automated way to know.

The other 2PC pain points

The coordinator-failure problem is the headline, but it is not the only reason 2PC has fallen out of favour for greenfield architectures.

Synchronous coordination kills latency. Every cross-participant transaction now requires two round trips to every participant before it can finish. If you have three participants and a 5 ms round trip to each, your transaction floor is 30 ms before you have done any actual work. For high-throughput systems this is a budget you cannot afford.

One slow participant blocks everyone. The coordinator cannot finish phase 1 until every participant has answered. A single participant that is slow today, perhaps because it is doing a vacuum or a backup or a noisy-neighbour episode, drags the latency of every transaction up to its level. The classic distributed-systems pathology of “the slowest node sets the speed” applies here at the worst possible moment.

Coordinator high availability does not fully save you. The obvious response to “the coordinator is a single point of failure” is “make the coordinator a cluster.” Modern transaction managers do exactly that, with consensus among coordinator replicas about the decision log. This helps a lot. It does not eliminate the in-doubt problem, because participants can still get partitioned away from the coordinator quorum. It also adds operational complexity that most teams underestimate until the first time they have to debug a Raft-backed coordinator that refuses to elect a new leader.

The blast radius of a partition is larger than it looks. In CAP terms, 2PC explicitly chooses consistency over availability. During a partition, transactions cannot commit. That is by design. It is also what people mean when they say “2PC does not survive the network.” The protocol is correct. The cost is that, in the failure mode where most transactions need to land, this protocol refuses to land any of them.

Where 2PC is still appropriate

It is worth being fair to 2PC. There are workloads where it is the right answer.

The clearest case is a transaction that has to span exactly two databases, where both are inside the same data centre, where the round-trip latency is microseconds, and where the application can tolerate the rare “the coordinator died, please call the DBA” incident. Many enterprise integration scenarios from the 2000s look like this. XA transactions over a JTA transaction manager were, for that shape of workload, unobjectionable.

It is also acceptable inside a single database product that uses 2PC internally to coordinate across its own shards. CockroachDB, YugabyteDB, and Spanner all use protocols that look like enhanced 2PC under the hood. What is different is that the coordinator and the participants are operated by the same team, on the same hardware fabric, with the same release cadence. The coordinator-failure scenario is something the database vendor has thought about, hardened, and tested, and the recovery is automated rather than manual. That is the difference between “2PC inside the database” and “2PC across the application.”

For new architectures that span services owned by different teams, that talk over a real network with real partitions, where the operational team for service A is not the operational team for service B, 2PC is almost never the right call.

What modern architectures use instead

The pattern that has displaced 2PC for cross-service transactions is the Saga.

The idea is structural. A Saga reframes a distributed transaction not as a single atomic operation but as a sequence of local transactions, each of which has a defined compensation. A compensation is the operation that semantically undoes the local transaction. If you charged a card, the compensation is a refund. If you reserved a seat, the compensation is to release the seat. If you sent a confirmation email, the compensation is to send a “your booking was cancelled” email. Each step commits locally, with the local guarantees of its own database.

The Saga then has a simple rule. If step N fails, run compensations for step N-1, N-2, all the way back to step 1, in reverse order. The end state is either “every step committed” or “every step that committed has been compensated.” The system is eventually consistent: there is a window during which some steps have committed and others have not, and the application has to be designed to tolerate that window.

There are two flavours of Saga implementation, and the difference is who orchestrates.

In a choreographed Saga, each service listens for events and reacts. The booking service emits “booking created.” The payment service consumes it, charges the card, and emits “payment captured.” The seat service consumes that, reserves the seat, and emits “seat reserved.” If any step fails, that service emits a failure event, and earlier services consume it and run their compensation. There is no central coordinator. The flow is implicit in the event topology.

In an orchestrated Saga, a single orchestrator service knows the steps and tells each service what to do, in order. The orchestrator is itself just an application, often built on a workflow engine like Temporal or AWS Step Functions, with durable state for the saga’s progress. The orchestrator is recoverable: if it crashes, it resumes from where it left off, reading from its own log.

Both have their place. Choreography is loosely coupled and shines when the saga is short and the steps are obviously independent. Orchestration is easier to reason about when the saga is long, when the order of steps matters subtly, or when error handling involves more than just “compensate everything.” For most teams starting fresh, an orchestrator built on a workflow engine is the lower-cognitive-cost option, because the engine handles the durability, retry, and timeout problems that an event-based saga would otherwise reinvent.

A worked example: paying for a flight

Imagine a flight-booking flow. The user clicks “buy.” Three things have to happen, in order:

Charge the user’s card via the payment service.
Reserve a seat in the inventory service.
Send a booking confirmation via the notification service.

The naive 2PC version would coordinate all three under a global transaction. If anything fails, abort everywhere. The protocol’s downsides apply: locks held across the slowest service, a coordinator that has to stay up, in-doubt states if it crashes.

The Saga version makes each step a local transaction with a compensation:

sequenceDiagram
    participant O as Orchestrator
    participant P as Payment
    participant I as Inventory
    participant N as Notification
    O->>P: charge card
    P-->>O: charged (txn id 8821)
    O->>I: reserve seat
    I-->>O: failure, seat already taken
    Note over O,P: compensation
    O->>P: refund txn 8821
    P-->>O: refunded
    O-->>O: saga ends with state "aborted"

If the seat reservation fails, the orchestrator runs the refund compensation against the payment service. The user’s card is charged for a few seconds, then refunded. The seat was never reserved. The notification was never sent. The end state is consistent in the eventual sense: every step that committed has been compensated.

The application has to handle the eventual-consistency window. Specifically, between the charge and the refund the user briefly has a charge on their card with no booking to show for it. For a flight-booking flow this is acceptable, because the window is short and the user is staring at a “processing your booking” screen. For other flows the window may be unacceptable and the design has to change. That is the architectural conversation the Saga forces you to have.

When you do still need atomic cross-service writes

Sometimes the eventual-consistency window of a Saga is genuinely too painful. The classic example is the transactional outbox pattern, where a service needs to atomically (a) write a row to its own database and (b) publish an event to a message broker. If those two operations are not atomic, you can publish events for state changes that did not happen, or persist state changes whose events never get published. Either is a real bug.

The outbox pattern solves this without 2PC. The service writes the event row into a local “outbox” table inside the same local transaction that updates its business state. A separate worker reads the outbox table and publishes to the broker, then deletes the row. The atomicity is local. The publication is at-least-once and idempotent at the consumer side. This pattern, and its sibling the transactional inbox, are the workhorses of modern event-driven architectures and we will spend a full lesson on them in Module 4.

The recommendation in 2026

For new architectures, prefer Sagas, prefer orchestration over choreography until you have a good reason to invert that, and prefer the transactional outbox pattern for the specific case of “atomically update state and emit an event.” Reach for 2PC only when both participants are inside a single trust and operations boundary, the latency budget can absorb it, and the recovery story has been thought through.

Most importantly, design the application to tolerate the eventual-consistency window. That is the architectural shift. 2PC tries to make distributed transactions feel like local ones. Sagas do not. They tell you, explicitly, that there will be a window in which the world is half-updated, and they make you say, in code and in product copy, what the user sees during that window. That honesty is a feature, not a defect.

The next lesson, the last in Module 2, is about how to make all of this work in the presence of duplicate messages, retries, and the impossibility of exactly-once delivery. It is the lesson that closes the module by giving you the architectural tool that, more than any other, makes distributed systems behave: idempotency.

Citations and further reading

Butler Lampson, “How to Build a Highly Available System Using Consensus” (1996). The classic reference that lays out 2PC and its limitations alongside earlier consensus protocols.
Hector Garcia-Molina and Kenneth Salem, “Sagas” (1987). The original paper that introduces the Saga pattern as an alternative to long-running ACID transactions.
Chris Richardson, “Microservices Patterns” (Manning, 2018), chapters on Saga and transactional messaging. Plain-language treatment of orchestrated vs choreographed Sagas with worked Java examples.
Temporal documentation on workflow-based saga orchestration, https://docs.temporal.io/ (retrieved 2026-05-01).