Consistency models: strong, eventual, causal, monotonic

The first reaction most engineers have to consistency, after a CAP-theorem lecture or two, is to file it under a binary. There is “strong” consistency, where the system behaves like a single machine, and there is “eventual” consistency, where the system is allowed to lie for a while and then catch up. Strong is correct and slow. Eventual is fast and weird. Pick one.

This binary is wrong, and the wrongness matters. Real systems offer a spectrum of consistency guarantees, and most of the interesting databases let you mix and match: strong for some operations, weaker for others, with explicit knobs. If you only know the two endpoints, you are choosing between “too expensive” and “too confusing” without realising there are five or six useful stations between them.

This lesson is the map. We will walk the spectrum from strongest to weakest, define each model precisely enough that you can tell when it is being violated, then run a single worked example (a shopping cart) under each model and watch what changes.

The spectrum

There are six consistency models worth knowing by name. They form a partial order: stronger models imply all the guarantees of weaker ones, which is why this is a hierarchy rather than a list of alternatives.

Linearizable (strong consistency)

The system behaves as if there is a single copy of the data and every operation takes effect at some single point in time between when it was issued and when it returned. Once a write succeeds, every subsequent read, on any replica, sees that write or a later one. Reads and writes appear in real-time order.

This is the strongest practical consistency model and the most expensive. To deliver linearizability across replicas, the system must coordinate every write through a quorum, which costs at least one round-trip to a majority of nodes. In a database that spans regions, that round-trip can be 50 to 200 milliseconds. Linearizability also forbids tricks like serving reads from a local replica without checking, because the local replica might be stale.

Real systems that offer linearizability: Google Spanner (with TrueTime, which is the next lesson), etcd, Zookeeper, single-node Postgres trivially, DynamoDB when you ask for ConsistentRead=true.

Sequential consistency

All operations appear in some total order, the same on every observer, but the order does not have to match real time. If process A writes X at 10:00:01 and process B writes Y at 10:00:02, sequential consistency lets every observer see Y first as long as everybody sees Y first.

This is rare as a design target in modern databases. It shows up in older textbooks, in shared-memory hardware (the memory models of x86 and ARM are roughly sequential consistency for some operations, weaker for others), and as a stepping stone in proofs. Most “I want a strong system” requests really mean linearizable.

Causal consistency

If operation A causally influenced operation B, every observer sees A before B. Causally unrelated operations can be observed in any order. “Causally influenced” is defined by message passing: if A happened on a node, then B happened on a node that had observed A, then A is a cause of B.

Causal consistency is the strongest model that does not require coordination on every write. Replicas can keep accepting local writes; they only need to track which writes depend on which others. This makes it a popular target for systems that want to feel right to humans without paying linearizability’s coordination tax.

The classic example: a user posts a status update, then a comment on their own update. Causal consistency guarantees that anyone who sees the comment also sees the original post. It does not guarantee that two unrelated users posting at the same time appear in the same order to everyone.

Read-your-writes

After a client writes X, that same client’s subsequent reads see X (or a value that supersedes X). Other clients get no such guarantee.

This is a session-level model, not a global one. It is usually implemented by routing a client’s reads to the replica that handled their last write, or by carrying a “last write timestamp” cookie that the read path uses to wait for replication. Almost every consumer app expects this without realising it: when you submit a form, you expect to see your own submission, even if it has not propagated to other regions yet.

Monotonic reads

Once a client has read a value, subsequent reads return that value or a newer one. The client never sees time go backward.

This is also session-level. The failure mode it prevents: a client reads from replica A and sees version 5, then reads from replica B and sees version 3. Without monotonic reads, the client appears to travel back in time, which breaks UI state, animations, and the user’s trust. Monotonic reads is usually implemented by client affinity (stick to one replica) or version-vector checks.

Eventually consistent

If writes stop, all replicas eventually converge to the same value. There is no guarantee about when, no guarantee about the order in which intermediate states appear, no guarantee about what individual reads see.

This is the weakest of the named models. It is also enough for many real workloads: the “viewed count” on a video, the rough size of a queue, the last-known location of a delivery driver. In each case, “the right answer plus or minus a few seconds” is fine.

A worked example: the shopping cart

Take a single user, Alice, with a shopping cart in a globally replicated store. Replicas live in three regions. Alice is browsing from a phone in Milan. The store’s app servers route her requests to the closest replica.

Alice does three things in sequence:

Adds item X to the cart.
Adds item Y to the cart.
Removes item X from the cart.

Now consider what happens under each model when Alice (or her partner Bob, on a different replica) reads the cart.

Linearizable. Any read, by anyone, after step 3 returns “Y only.” Reads taken between steps return cart states that match real time: “X” after step 1, “X and Y” after step 2, “Y” after step 3. The cost: every write waits for a quorum across the three regions. If the regions are continents apart, every add-to-cart takes 100 milliseconds or more.

Causal. Reads see the operations in the order Alice did them: nobody sees the remove-X without having seen the add-X. But two concurrent users adding items to a shared family cart might see each other’s adds in different orders. For a single-user cart, causal feels indistinguishable from linearizable. For a shared cart, causal is the right model: it preserves the user’s intent without paying for global ordering.

Read-your-writes. Alice’s phone sees her own changes immediately: add-X, then add-X-and-Y, then Y. If Alice opens the cart on her laptop, which routes to a different replica, she might briefly see the cart in its previous state. Bob, looking at the same cart from Rome, might see any version of it. The latency is much lower than linearizable; the price is that other observers can be confused.

Monotonic reads. Alice never sees the cart go backward. If she has seen the cart with items X and Y, she will not, on the next refresh, see only X. But she might not see her own removal of X immediately, if she is reading from a replica that has not yet received it. This model rules out the worst UI bug (“my item came back”), without solving the freshness problem.

Eventually consistent. The cart will, eventually, reach a state where everyone agrees. In the meantime, anything goes: Alice’s phone might show X and Y, her laptop might show only Y, and Bob might see an empty cart. After the dust settles, all replicas converge. The latency is lowest; the user experience is the worst.

The point of the example is that the right consistency model is not “the strongest available.” It is the weakest model that does not produce visible bugs for your workload. Carts mostly want causal plus read-your-writes. Bank balances want linearizable. Like counts want eventually consistent.

The cost-benefit table

Model	Read latency	Write latency	Coordination	Common bugs prevented
Linearizable	High	High	Quorum on every write, fresh reads	Stale reads, lost writes, time travel
Causal	Low	Low	Track dependencies, no quorum	Out-of-order causal effects
Read-your-writes	Low	Low	Per-session routing	”Where did my submission go?”
Monotonic reads	Low	Low	Per-session affinity	”My data went back in time”
Eventual	Lowest	Lowest	None	Almost none

The expensive part of stronger models is not the algorithm; it is the coordination. Every guarantee about what reads see is paid for in messages between replicas, and every message has a latency floor set by the speed of light.

Real systems and what they offer

A short tour of databases and the consistency knobs they expose.

Google Spanner. Linearizable by default, globally, using TrueTime to bound clock uncertainty. The price is that every transaction waits for a TrueTime “commit wait” of a few milliseconds. Spanner is the existence proof that linearizable can be made fast enough for production at planet scale, given a budget for atomic clocks.
DynamoDB. Eventually consistent reads by default, configurable to “strongly consistent” reads at twice the cost and higher latency. The strong reads are linearizable on a single partition. Cross-partition transactions are a separate, more expensive primitive.
MongoDB. Tunable per query via the read concern and write concern settings. “Read concern majority” plus “write concern majority” gets you linearizable reads on the leader. Looser settings give you read-your-writes or eventual.
Cassandra. Tunable per operation via consistency levels (ONE, QUORUM, LOCAL_QUORUM, ALL). Quorum reads plus quorum writes give you something close to linearizable on a single key. Lower levels give you eventual.
Postgres replication. The primary is linearizable trivially. Synchronous replicas are linearizable. Async replicas are eventually consistent, with the option of read-your-writes if your application is careful about routing.

Notice the pattern: no major distributed database picks one model and sticks to it. They all expose knobs, because different operations within the same application have different consistency needs. The architectural skill is knowing which knob to turn for which operation.

The hierarchy

flowchart TD
    L[Linearizable] --> S[Sequential]
    S --> C[Causal]
    C --> RYW[Read-your-writes]
    C --> MR[Monotonic reads]
    RYW --> E[Eventually consistent]
    MR --> E

Stronger models are at the top. An arrow from A to B means “A implies B”: if your system is linearizable, it is also causal, monotonic, and eventually consistent. Picking a model means picking how far down the diagram you are willing to fall.

The takeaway

Three things to carry into the next lesson.

First, “strong” and “eventual” are the endpoints of a real spectrum, not the only options. Causal consistency in particular is underused; it captures most of what users expect, at a fraction of the coordination cost.

Second, consistency is per-operation, not per-system. The same database can be linearizable for a balance update and eventually consistent for a view counter. Treating consistency as a single global setting throws away most of the design space.

Third, the cost of stronger models is paid in latency, and latency is set by physics. Coordination across continents takes time. If your system needs both global scale and linearizable consistency, you will pay for it in user-visible delay, or in atomic clocks, or both.

The next lesson goes underneath consistency, into time itself: why physical clocks are unreliable, what Lamport timestamps and vector clocks give us instead, and how Spanner buys linearizable consistency at planetary scale by buying better clocks.