By 2014 we had rebuilt the pricing engine twice. The first rebuild moved from a blocking queue architecture to the Disruptor. The second addressed the aggregation logic and quote distribution. This post covers the architecture that emerged — not as a blueprint, but as an account of the decisions and why we made them.

The Problem Statement

LPs (Liquidity Providers)
  │  raw FIX feeds, ~50k ticks/s combined
  │
  ▼
FX Pricing Engine
  │  apply: aggregation, spread, client tiering, risk limits
  │
  ▼
Clients (market makers, algo desks, external)
  │  executable quotes, target: < 500µs tick-to-quote p99
  │
  ▼
Order Management System → exchange / LP execution

The engine must ingest ticks from multiple LPs per pair, aggregate the best available price, apply a spread (the firm’s margin), apply per-client adjustments, enforce risk limits, and distribute executable quotes to clients — all within a tick-to-quote budget of sub-millisecond at the 99th percentile.

Stage 1: Feed Handler

One thread per LP connection. Each thread owns its FIX session and its order book for that LP:

LP-A thread:  parse FIX X message → update LP-A's order book → publish top-of-book to ring buffer
LP-B thread:  parse FIX X message → update LP-B's order book → publish top-of-book to ring buffer
LP-C thread:  ...

Publishing to the ring buffer is the only shared state. Each LP thread writes to its own slot in the ring — no contention between them. The ring buffer is large enough to absorb burst without dropping events.

What gets published: not the raw FIX message, but the normalised top-of-book for that LP — a pre-computed {symbol, bestBid, bestOffer, bidQty, offerQty, lpId, timestamp} struct. The downstream stages don’t need the full book depth; they need the LP’s current best price.

Stage 2: Aggregator

A single thread reads from all LP ring buffers and maintains the aggregated best price per symbol.

Aggregation rule: take the best bid across all LPs, the best offer across all LPs. For a standard market maker, this is straightforward. In practice there are complications:

  • LP credit limits: if we’ve hit our credit limit with LP-B, exclude their prices from aggregation until the limit resets
  • LP health: a feed that stopped updating 5 seconds ago is stale — exclude it
  • Last-look rejection rate: an LP with >20% rejection rate on our orders gets worse aggregation weight
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class Aggregator {
    // Per symbol: sorted LP prices
    Map<Symbol, LPPriceSet> lpPrices = new EnumMap<>(Symbol.class);

    void onLPUpdate(LPTopOfBook update) {
        LPPriceSet priceSet = lpPrices.get(update.symbol);
        priceSet.update(update.lpId, update.bestBid, update.bestOffer);

        // Recompute aggregate only if this LP contributed to the best
        if (priceSet.affectsBest(update.lpId)) {
            recomputeAndPublish(update.symbol, priceSet);
        }
    }
}

The affectsBest short-circuit is important: if LP-C’s new bid is 1.08440 and the current best bid (from LP-A) is 1.08450, LP-C’s update doesn’t change the aggregate. Skip the recompute entirely — no downstream propagation, no wasted work.

Stage 3: Spread Engine

Takes the mid-price and computes bid/offer prices per client tier.

Aggregate mid: 1.08451

Client tier A (institutional):  bid = mid - 0.3pip  offer = mid + 0.3pip
Client tier B (retail):         bid = mid - 1.5pip  offer = mid + 1.5pip
Client tier C (internal):       bid = mid - 0.0pip  offer = mid + 0.0pip  (no spread)

Spread logic is configurable — tighter during liquid hours, wider during thin markets, adjustable per instrument. The spread engine reads the spread config from a volatile reference (updated by an operations thread when config changes) — no locking, the config update is published atomically.

Risk adjustments live here too: if the risk engine signals that we’re long EUR, widen the offer and tighten the bid to encourage clients to sell EUR to us (reducing our exposure). The risk engine posts its adjustments via an AtomicReference<RiskAdjustments>; the spread engine reads it on each price computation.

Stage 4: Quote Distribution

One thread per client connection (or per client group for high-volume clients with dedicated connections). Reads from the spread engine’s output ring, filters by symbol subscription, and publishes FIX W (Market Data Snapshot) or X (Market Data Incremental Refresh) messages.

Conflation: if two prices for the same symbol arrive faster than the client can consume them, the older one is discarded. Clients don’t benefit from seeing a price they can’t act on. The conflation buffer per symbol per client holds the latest price; the distribution thread reads the latest and sends it.

Spread engine output (fast)   →   Conflation buffer   →   Client FIX session (slower)
1.08451 / 1.08453                 [latest price only]       sends at client's rate
1.08452 / 1.08454                 ↓ overwrites
1.08453 / 1.08455  ──────────────▶ 1.08453 / 1.08455 ──────▶ client receives this
                                   (1.08451 and 1.08452
                                    were conflated away)

Clients that are fast enough get all prices. Clients behind the conflation boundary get the best price that was available when their connection drained the buffer.

The Full Pipeline

LP-A FIX ──▶ Feed Handler A ──▶ ┐
LP-B FIX ──▶ Feed Handler B ──▶ ├──▶ Aggregator ──▶ Spread Engine ──▶ Conflation ──▶ Client A
LP-C FIX ──▶ Feed Handler C ──▶ ┘                                        Buffer  ──▶ Client B
                                                                                   ──▶ Client C

Thread count: 3 LP feed threads + 1 aggregator + 1 spread engine + N client distribution threads. Each stage communicates via a Disruptor ring buffer; no shared mutable state between stages.

Latency Budget Breakdown

Measured at p99, the 500µs budget breaks down as:

Stagep50p99
FIX parse + book update8µs22µs
Ring buffer publish<1µs3µs
Aggregation + best compute4µs11µs
Spread + risk adjustment3µs8µs
Conflation + FIX encode5µs14µs
Client TCP write6µs28µs
Network (co-lo)35µs90µs
Total tick-to-quote62µs176µs

Network (co-location link to exchange) dominated the budget. The software pipeline was under 100µs at p99 with margin — which meant the network was the right place to focus next, not more micro-optimisation of the Java code.

The lesson: measure the full path before optimising any individual component. We spent two sprints optimising the FIX parser before instrumenting the end-to-end pipeline and discovering that TCP write latency and network were 5x more expensive than the parser we’d been optimising.