Distributed Consistency Models: What Your Service Actually Guarantees

At the large US tech company, the hardest design review conversations were almost never about which database to use. They were about what consistency guarantee the system needed, and whether the proposed design actually provided it.

“Eventually consistent” is not a useful answer to “what does this service guarantee?” It describes the best case for a wide range of behaviours, some of which are harmless and some of which can cause correctness bugs in production.

The Consistency Spectrum

Stronger ←──────────────────────────────────────────────→ Weaker

Linearisable   Serialisable   Causal   Read-your-writes   Eventual

Linearisability (also called strong consistency): every operation appears to take effect instantaneously at a single point in time. All observers see operations in the same order. This is the strongest guarantee.

If write(x=1) completes before read(x) starts:
  Linearisable guarantees: read(x) returns 1

Systems: single-master SQL with synchronous replication, etcd, Zookeeper, Spanner.

Serializability: transactions execute as if they ran serially — one after another, not concurrently. The actual concurrent execution is equivalent to some serial execution, but that serial order may not match real-time order.

T1 reads x=0, writes x=1
T2 reads x=0, writes x=2

Serialisable allows: T1 then T2 (x=2) OR T2 then T1 (x=1)
Not allowed: both read 0 and both commit (lost update)

Systems: most ACID databases with SERIALIZABLE isolation (which is often not the default).

Causal consistency: if operation A causally precedes operation B (B depends on A’s result), all observers see A before B. Operations without causal relationship may be seen in different orders by different observers.

User posts comment C1
User replies to C1 with C2
Causal consistency guarantees: any observer who sees C2 also sees C1

But: two unrelated comments from different users may appear in different order
     to different observers

Systems: Dynamo-style with vector clocks, some document databases with session guarantees.

Read-your-writes: after a write, subsequent reads from the same client see the write.

Client: write(x=1)   → success
Client: read(x)      → must return 1 (not a stale value from before the write)

This is weaker than causal consistency but handles the most common user-visible consistency bug: “I just updated my profile, why doesn’t it show my change?”

Eventual consistency: given no new writes, all replicas will eventually converge to the same value. No guarantees about when, or about what intermediate values may be observed.

Systems: DNS, Dynamo without quorum, many NoSQL databases with default settings.

Why This Matters in Practice

The failure mode: a system that behaves as if it has strong consistency when it doesn’t. This happens when:

Testing is done on a single-node database (which provides linearisability even if the multi-node version doesn’t)
The application is developed with assumptions that are violated under network partitions or replication lag

Example: double-spend prevention

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
func processPayment(userID string, amount int64) error {
    balance, err := db.GetBalance(userID)
    if err != nil { return err }

    if balance < amount {
        return ErrInsufficientFunds
    }

    return db.DeductBalance(userID, amount)
}

This code looks correct. Under linearisability (single-writer SQL with transactions), it is correct. Under eventual consistency or with read replicas in async replication, two concurrent calls can both read balance=100, both pass the check, and both deduct — resulting in a negative balance.

The fix with optimistic locking (works with most databases):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
func processPayment(userID string, amount int64) error {
    tx, _ := db.Begin()
    defer tx.Rollback()

    var balance, version int64
    tx.QueryRow("SELECT balance, version FROM accounts WHERE id=$1 FOR UPDATE",
                userID).Scan(&balance, &version)

    if balance < amount {
        return ErrInsufficientFunds
    }

    result, _ := tx.Exec(
        "UPDATE accounts SET balance=balance-$1, version=version+1 WHERE id=$2 AND version=$3",
        amount, userID, version)

    if rows, _ := result.RowsAffected(); rows == 0 {
        return ErrConcurrentModification  // retry
    }

    return tx.Commit()
}

FOR UPDATE takes a row lock; the version check and CAS update prevents the lost update. This requires a database that provides at least serialisability within a transaction — which most SQL databases do, if you use transactions correctly.

The CAP Theorem in Practice

CAP theorem: a distributed system can provide at most two of Consistency, Availability, and Partition tolerance.

Since network partitions are a reality of any distributed system, you’re choosing between CP (consistency under partition — may become unavailable) and AP (availability under partition — may return stale data).

This is a useful mental model for understanding why distributed databases make different trade-offs. It’s less useful for day-to-day engineering decisions because:

“Partition tolerance” conflates many different failure scenarios
“Consistency” in CAP means linearisability, but many consistency models exist between linearisability and chaos
Real systems tune the trade-off per operation, not globally

A more practical framing: PACELC — if there’s a Partition, choose Availability or Consistency; Else (no partition), choose Latency or Consistency.

Most business applications choose availability over strict consistency for reads (acceptable: show slightly stale data) and consistency over availability for writes (not acceptable: lose a write or allow a double-spend).

What Databases Actually Provide

Database	Default isolation	Strongest available	Notes
PostgreSQL	Read Committed	Serializable	SERIALIZABLE isolation available
MySQL (InnoDB)	Repeatable Read	Serializable	Available but rarely used
MongoDB	Read Uncommitted	Snapshot	Causal sessions available
DynamoDB	Eventual	Strong (per item)	Transactions available at higher cost
Cassandra	ONE (1 replica)	QUORUM	Tunable per query
Redis	Linearisable (single)	—	Multi-node requires careful config
CockroachDB	Serializable	Serializable	Distributed by default

The gotcha with many databases: the default isolation is weaker than SERIALIZABLE. PostgreSQL’s default is READ COMMITTED — which prevents dirty reads but allows non-repeatable reads and phantom reads. Applications that need stronger guarantees need to explicitly request it.

The Practical Checklist

When designing a service that writes shared state:

What’s the invariant? (balance can’t go negative, email must be unique, order can’t be double-processed)
What’s the access pattern? (single writer, multiple writers, concurrent readers + writers)
What database isolation does the current setup provide? (check actual settings, not defaults)
Where can you use pessimistic locking? (SELECT FOR UPDATE before mutating)
Where can you use optimistic locking? (version columns, conditional updates)
What’s the failure behaviour under a partition? (return error, return stale, or queue for later?)

The answer to the last question defines whether you’re building a CP or AP system for that operation. The right answer depends on what the operation is — a financial debit needs CP (prefer error over inconsistency), a profile view can be AP (prefer stale over error).

Consistency models are not an academic topic. Every distributed system has a consistency model, whether you’ve chosen it explicitly or not. Systems designed with a clear understanding of what they guarantee are easier to reason about, less likely to have correctness bugs, and easier to evolve safely.

The Consistency Spectrum#

Why This Matters in Practice#

The CAP Theorem in Practice#

What Databases Actually Provide#

The Practical Checklist#

The Consistency Spectrum

Why This Matters in Practice

The CAP Theorem in Practice

What Databases Actually Provide

The Practical Checklist