I joined the trading desk on a Monday. By Friday I had broken the USD/JPY price feed.

Not catastrophically — it was a staging environment and the feed recovered in seconds — but the experience of watching a real-time market data stream go silent because of my code was unlike anything I’d encountered at university. It crystallised something immediately: in this environment, software failures have a price tag attached.

The Stack I Walked Into

The existing codebase was Java 6 (migrating to 7), with a homegrown messaging layer built around BlockingQueue. The architecture had the virtue of simplicity: market data came in over FIX, got normalised, distributed internally, and fed into a pricing engine that emitted executable quotes to clients.

What it lacked was predictability under load. The queues were bounded, but the handling of InterruptedException throughout the codebase was… optimistic. At moderate throughput everything was fine. At peak — London open, US open, major economic announcements — things got interesting.

What “Low Latency” Actually Meant Here

Before joining, “low latency” to me meant “fast web API.” Within a week I understood the distinction between:

  • Throughput: messages per second (we cared, but it wasn’t primary)
  • Latency: time from market tick to quoted price update to client (this was the metric)
  • Tail latency: the 99th/99.9th percentile — what happened during bursts

The internal SLA was sub-millisecond tick-to-quote 99% of the time. That sounds generous. It isn’t, when your JVM decides to do a minor GC in the middle of a news event.

The First Real Problem I Worked On

Three weeks in, I was given ownership of the feed normalisation component — the piece that translated raw FIX messages from LPs into an internal format. My brief: make it more predictable under load.

The existing code allocated a new object per message. Not aggressively, but consistently. At 50,000 messages/second, that’s 50,000 objects/second hitting eden space. Minor GC was frequent and occasionally spiky.

The fix was straightforward in concept: object pooling for the normalised message objects. In practice, getting the pool sizing right and avoiding contention on the pool itself took two weeks of iteration.

That experience introduced me to the tooling that would occupy the next few years: JVisualVM, GC logs, -XX:+PrintGCDetails, and eventually JMH. The problem wasn’t hard. Measuring it honestly was.

What I’d Tell Someone Starting in This Environment

  1. Get comfortable reading GC logs before anything else. -Xloggc and a log analyser (GCViewer, or just grep) will tell you more about your application’s behaviour than any profiler.
  2. Understand the FIX protocol. It’s ugly but it’s everywhere. QuickFIX/J for Java is fine; just read the spec for the message types you’re handling.
  3. Don’t optimise before you measure. I spent a day micro-optimising a method that accounted for 0.3% of latency. The object allocation issue I found afterward accounted for 40%.