Financial systems are natural candidates for event sourcing. Regulators want to know the state of positions at any point in time. Audit trails are not optional. The need to replay a day’s events to debug a pricing anomaly comes up regularly. These requirements — which other domains treat as optional — map directly onto event sourcing’s core properties.
That said, event sourcing in production has costs that the enthusiast literature systematically underplays. Here’s an honest accounting.
Why Financial Systems Fit
The fundamental property of event sourcing: current state is a derived projection of an immutable, append-only event log. Instead of storing the current position, you store every trade that contributed to the position and compute the position by replaying them.
Traditional model: Event sourcing model:
positions table: events log:
┌──────────────────┐ ┌──────────────────────────────────────┐
│ sym │ net_qty │ │ ts │ event_type │ sym │ qty │
│ EURUSD│ +2.5M │ │ 09:31:02 │ TRADE │ EUR │ +1M │
│ GBPUSD│ -1.0M │ │ 09:32:14 │ TRADE │ EUR │ +2M │
│ ... │ ... │ │ 09:45:33 │ TRADE │ EUR │ -0.5M│
└──────────────────┘ │ 10:12:01 │ TRADE │ GBP │ -1M │
└──────────────────────────────────────┘
Current EUR position: sum(+1M, +2M, -0.5M) = +2.5M
The audit trail is free. The event log is the audit trail — there’s nothing to reconstruct. Every state the system was ever in is derivable from the log.
Point-in-time queries are exact. “What was our EUR/USD position at 09:40:00?” means “replay all EUR/USD events with timestamps ≤ 09:40:00 and sum them.” No approximation, no historical snapshot infrastructure, no separate audit database.
Debugging by replay. When the pricing engine produced a suspicious quote at 10:47:33, we could replay the event stream up to that timestamp and reproduce the exact state that produced it. With a traditional mutable-state model, the state that led to the bug is overwritten by subsequent events — you can only infer what happened from logs.
The Event Schema
The event schema is the most important design decision and the one most likely to cause long-term pain.
A minimal trade event:
| |
The EventID is critical for idempotency. If the same event is published twice (network retry, exactly-once semantics failure), the projection must deduplicate. Without an idempotency key, replaying a log with a duplicate event corrupts the projection.
The Projection Problem
Reading current state from an event log is slow if you replay from the beginning every time. At 50,000 events/day, after two years the position query “replay all EUR/USD events” means scanning 36 million events before returning a result.
The solution: projections (also called read models). A separate process (or the same process with a background goroutine) reads the event log and maintains a materialised view of current state in a fast-access store:
Event log (source of truth)
│
▼
Projection worker (async)
│
├──▶ Current positions table (Postgres/Redis) — O(1) read
├──▶ Risk aggregates (ClickHouse) — fast analytical queries
└──▶ Reporting materialised view (read-optimised schema)
The projection is disposable — if it gets out of sync or the schema changes, you drop it and rebuild by replaying the event log. The event log is the durable source; projections are caches.
The eventual consistency problem: projections are asynchronous. The event lands in the log first; the projection updates a moment later. If a client reads current position immediately after submitting a trade, they might see stale state. For an external-facing API this requires careful design — either accept eventual consistency and communicate it clearly, or synchronously update a read model as part of the event write (complicates the write path, breaks the clean separation).
For our use case (risk analytics consumed by the risk engine, not by clients directly), eventual consistency at the second scale was acceptable. For a client-facing “your current position” endpoint, it was not.
Event Schema Evolution: The Hard Part
Event schemas change. Business logic changes. New fields are added. Old fields become irrelevant. The problem: the event log is immutable and contains events in old schemas. Your new projection code must handle both old and new schemas.
Upcasting is the standard pattern: when reading an old event, transform it to the current schema before processing:
| |
This works, but the upcasting code accumulates over time. After three years of schema evolution, we had 7 event versions and an upcasting chain that was a fragile accretion of business history. The lesson: invest heavily in getting the initial event schema right. Adding fields is easy (old events just have null/zero values). Renaming, splitting, or removing fields requires upcasting for as long as the event log lives — potentially forever.
Snapshot Checkpoints
When replay time becomes unacceptable (projection rebuild taking hours), add snapshot checkpoints: periodically save the projection state alongside the event sequence number it was computed from. On rebuild, start from the nearest snapshot rather than event 0.
Event log: event 1 ... event 100,000 ... event 1,000,000
↑
snapshot: {positions: {...}, seq: 100000}
Rebuild: load snapshot at seq 100,000 → replay only events 100,001 to present
The snapshot is a derived artefact and can always be discarded and rebuilt from the event log. Store it separately from the event log; don’t treat it as durable source of truth.
The Verdict
Event sourcing in financial systems: strongly beneficial for the audit trail, point-in-time query, and replay debugging requirements that are genuine needs, not nice-to-haves.
The costs are real:
- Schema evolution is harder and requires long-term discipline
- Projections add an asynchronous layer that must be managed
- Eventual consistency requires explicit design for read-after-write scenarios
- Snapshot management adds operational complexity as the log grows
The right scope: use event sourcing for the domain where the audit and replay properties are genuinely needed (trading positions, order lifecycle, risk events). Don’t apply it universally to every domain in the system — the costs are not free, and many parts of a financial system (reference data, configuration, user sessions) don’t benefit from it.
The pattern that worked: event-sourced core trading domain, traditional CRUD for everything else, and a clear boundary between the two. The complexity lives only where it earns its keep.