Hft-Foundations

Five Years in High-Frequency Trading: What I Actually Learned

Five years ago I joined the electronic trading firm not knowing what a cache line was. I thought garbage collection was something that happened to other people’s code. I had never looked at assembly output from a Java program. I’d heard of the LMAX Disruptor but had no idea why it existed. By the time I left, I had opinions about CPU prefetchers. I had read the Intel 64 and IA-32 Architectures Software Developer’s Manual for fun. I could look at a flame graph and immediately see the GC pressure. I had shipped components processing a million messages per second with sub-millisecond p99 guarantees. Here’s what that environment actually teaches you. ...

Scala Akka Actors for Trading Workflows: Promises and Pitfalls

The case for Akka actors in a trading system sounds compelling: isolated mutable state (no shared memory, no locks), message-driven concurrency, built-in supervision hierarchies for fault tolerance, and location transparency for distributed deployments. We used Akka for the order lifecycle workflow layer — the component that orchestrated the state machine from order received to fill confirmed. Here’s what we learned. ...

Memory-Mapped Files in Java: Chronicle and the Art of Zero-Copy I/O

Disk I/O and latency-sensitive systems don’t mix well. Or so the conventional wisdom goes. Memory-mapped files challenge that assumption: when the OS maps a file into virtual memory, writes and reads go through the OS page cache. If the working set fits in RAM (which it usually does for recent data), access times are in the hundreds of nanoseconds — comparable to normal memory access, not disk I/O. Chronicle Queue is built entirely on this foundation. Understanding what’s happening underneath explains both why it’s fast and what its failure modes are. ...

Chronicle Queue vs Kafka: Choosing a Persistent Journal at Nanosecond Scale

By early 2015 we had both Chronicle Queue and Kafka in production — Chronicle for intra-day trade journaling, Kafka for end-of-day data pipelines. The question came up repeatedly: why not use one for both? The answer is that they solve different problems with incompatible design priorities. ...

Benchmarking Without Lying: JMH, Coordinated Omission, and Honest Numbers

I spent a morning once very proud of a benchmark showing our new order-matching path had p99 latency of 180µs, down from 340µs. It was a 47% improvement. I presented it in a team meeting. An engineer asked one question: “Is that closed-loop or open-loop?” I didn’t know what that meant. The benchmark was worthless. ...

Aeron: Reliable UDP Multicast for Market Data Distribution

Our market data distribution problem was straightforward to state and hard to solve: deliver price updates to a dozen internal consumers with sub-500µs latency at 400,000 messages/second, with no head-of-line blocking between consumers. TCP broadcasting is serial — slow consumers stall fast ones. ZeroMQ was promising but showed GC pressure from its buffer management. Kafka was built for durability, not microsecond latency. When Martin Thompson and Todd Montgomery open-sourced Aeron in 2014, it solved almost exactly this problem. ...

Busy Spinning vs Blocking: Thread Strategies for Ultra-Low Latency

When a thread is waiting for work — a new event, a lock to release, a signal — it has two options. It can block (tell the OS “wake me up when there’s work”) or busy-spin (loop checking a condition, never yielding the CPU). Both are correct. They have very different performance profiles. ...

Scala on the Hot Path: Where the Abstraction Cost Goes

We were using Scala in a few non-critical components at the trading firm — utility code, configuration, some tooling. Then someone proposed moving a market data normalisation component to Scala. The component processed 800,000 messages/second and had a 500µs latency budget per message at p99. The discussion that followed taught me more about the JVM than a year of reading. ...

Java Chronicle: Off-Heap Persistence Without Serialisation Overhead

Every trade needs to be journaled. You need a durable, ordered record of every order, fill, and state change — for risk, reconciliation, and regulatory purposes. The naive solution is a database write on the hot path. That’s a roundtrip to an external process, a network call, and often a disk fsync. It’s also hundreds of milliseconds of latency per event. Chronicle Queue gave us persistent journaling at sub-microsecond overhead. Here’s how. ...

Comparing ArrayBlockingQueue to the Disruptor: Numbers Don't Lie

After writing about the Disruptor’s design, the obvious question is: how much faster is it, really? “Faster” is not a useful answer. Let’s look at actual numbers under controlled conditions. This is a benchmarking exercise, not a recommendation. The right data structure depends on your use case. The goal here is to understand the performance characteristics of each under different contention patterns. ...