Event Sourcing in Financial Systems: Real Benefits, Real Costs

Financial systems are natural candidates for event sourcing. Regulators want to know the state of positions at any point in time. Audit trails are not optional. The need to replay a day’s events to debug a pricing anomaly comes up regularly. These requirements — which other domains treat as optional — map directly onto event sourcing’s core properties.

That said, event sourcing in production has costs that the enthusiast literature systematically underplays. Here’s an honest accounting.

Why Financial Systems Fit

The fundamental property of event sourcing: current state is a derived projection of an immutable, append-only event log. Instead of storing the current position, you store every trade that contributed to the position and compute the position by replaying them.

Traditional model:          Event sourcing model:

positions table:            events log:
┌──────────────────┐        ┌──────────────────────────────────────┐
│ sym   │ net_qty  │        │ ts          │ event_type │ sym  │ qty │
│ EURUSD│ +2.5M   │        │ 09:31:02    │ TRADE      │ EUR  │ +1M │
│ GBPUSD│ -1.0M   │        │ 09:32:14    │ TRADE      │ EUR  │ +2M │
│ ...   │ ...      │        │ 09:45:33    │ TRADE      │ EUR  │ -0.5M│
└──────────────────┘        │ 10:12:01    │ TRADE      │ GBP  │ -1M │
                            └──────────────────────────────────────┘

                            Current EUR position: sum(+1M, +2M, -0.5M) = +2.5M

The audit trail is free. The event log is the audit trail — there’s nothing to reconstruct. Every state the system was ever in is derivable from the log.

Point-in-time queries are exact. “What was our EUR/USD position at 09:40:00?” means “replay all EUR/USD events with timestamps ≤ 09:40:00 and sum them.” No approximation, no historical snapshot infrastructure, no separate audit database.

Debugging by replay. When the pricing engine produced a suspicious quote at 10:47:33, we could replay the event stream up to that timestamp and reproduce the exact state that produced it. With a traditional mutable-state model, the state that led to the bug is overwritten by subsequent events — you can only infer what happened from logs.

The Event Schema

The event schema is the most important design decision and the one most likely to cause long-term pain.

A minimal trade event:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
type TradeEvent struct {
    EventID     string    `json:"event_id"`      // idempotency key
    EventType   string    `json:"event_type"`    // "TRADE_EXECUTED"
    AggregateID string    `json:"aggregate_id"`  // e.g. instrument identifier
    Timestamp   time.Time `json:"timestamp"`

    // Event-specific payload
    Instrument  string  `json:"instrument"`
    Quantity    int64   `json:"quantity"`         // signed: positive=buy, negative=sell
    Price       float64 `json:"price"`
    TradeID     string  `json:"trade_id"`
    Counterparty string `json:"counterparty"`
}

The EventID is critical for idempotency. If the same event is published twice (network retry, exactly-once semantics failure), the projection must deduplicate. Without an idempotency key, replaying a log with a duplicate event corrupts the projection.

The Projection Problem

Reading current state from an event log is slow if you replay from the beginning every time. At 50,000 events/day, after two years the position query “replay all EUR/USD events” means scanning 36 million events before returning a result.

The solution: projections (also called read models). A separate process (or the same process with a background goroutine) reads the event log and maintains a materialised view of current state in a fast-access store:

Event log (source of truth)
      │
      ▼
Projection worker (async)
      │
      ├──▶ Current positions table (Postgres/Redis) — O(1) read
      ├──▶ Risk aggregates (ClickHouse) — fast analytical queries
      └──▶ Reporting materialised view (read-optimised schema)

The projection is disposable — if it gets out of sync or the schema changes, you drop it and rebuild by replaying the event log. The event log is the durable source; projections are caches.

The eventual consistency problem: projections are asynchronous. The event lands in the log first; the projection updates a moment later. If a client reads current position immediately after submitting a trade, they might see stale state. For an external-facing API this requires careful design — either accept eventual consistency and communicate it clearly, or synchronously update a read model as part of the event write (complicates the write path, breaks the clean separation).

For our use case (risk analytics consumed by the risk engine, not by clients directly), eventual consistency at the second scale was acceptable. For a client-facing “your current position” endpoint, it was not.

Event Schema Evolution: The Hard Part

Event schemas change. Business logic changes. New fields are added. Old fields become irrelevant. The problem: the event log is immutable and contains events in old schemas. Your new projection code must handle both old and new schemas.

Upcasting is the standard pattern: when reading an old event, transform it to the current schema before processing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
func upcast(raw json.RawMessage, version int) TradeEvent {
    switch version {
    case 1:
        // v1 had "volume" instead of "quantity"
        var v1 struct {
            Volume int64 `json:"volume"`
            // ... other v1 fields
        }
        json.Unmarshal(raw, &v1)
        return TradeEvent{Quantity: v1.Volume, /* ... */}
    case 2:
        // v2 renamed counterparty_id to counterparty
        // handle rename
    default:
        var current TradeEvent
        json.Unmarshal(raw, &current)
        return current
    }
}

This works, but the upcasting code accumulates over time. After three years of schema evolution, we had 7 event versions and an upcasting chain that was a fragile accretion of business history. The lesson: invest heavily in getting the initial event schema right. Adding fields is easy (old events just have null/zero values). Renaming, splitting, or removing fields requires upcasting for as long as the event log lives — potentially forever.

Snapshot Checkpoints

When replay time becomes unacceptable (projection rebuild taking hours), add snapshot checkpoints: periodically save the projection state alongside the event sequence number it was computed from. On rebuild, start from the nearest snapshot rather than event 0.

Event log: event 1 ... event 100,000 ... event 1,000,000
                          ↑
                    snapshot: {positions: {...}, seq: 100000}

Rebuild: load snapshot at seq 100,000 → replay only events 100,001 to present

The snapshot is a derived artefact and can always be discarded and rebuilt from the event log. Store it separately from the event log; don’t treat it as durable source of truth.

The Verdict

Event sourcing in financial systems: strongly beneficial for the audit trail, point-in-time query, and replay debugging requirements that are genuine needs, not nice-to-haves.

The costs are real:

Schema evolution is harder and requires long-term discipline
Projections add an asynchronous layer that must be managed
Eventual consistency requires explicit design for read-after-write scenarios
Snapshot management adds operational complexity as the log grows

The right scope: use event sourcing for the domain where the audit and replay properties are genuinely needed (trading positions, order lifecycle, risk events). Don’t apply it universally to every domain in the system — the costs are not free, and many parts of a financial system (reference data, configuration, user sessions) don’t benefit from it.

The pattern that worked: event-sourced core trading domain, traditional CRUD for everything else, and a clear boundary between the two. The complexity lives only where it earns its keep.

Why Financial Systems Fit#

The Event Schema#

The Projection Problem#

Event Schema Evolution: The Hard Part#

Snapshot Checkpoints#

The Verdict#