The LMAX Disruptor: How a Ring Buffer Changed My Mental Model of Queues

In mid-2013 we replaced our internal LinkedBlockingQueue-based event bus with the LMAX Disruptor. Median latency dropped by 30%. The 99th percentile dropped by more than half. The change touched about 400 lines of code. This post is about the conceptual model you need to understand why the Disruptor is fast — not just “it uses a ring buffer,” but what that actually means for your hardware. ...

February 28, 2013 · 4 min · MW

Introduction to Lock-Free Programming in Java

Locks work. synchronized in Java is correct, well-understood, and wrong for our use case. A lock that’s contested causes a thread to block — the OS parks it, context-switches to something else, and eventually context-switches back. Each of those transitions costs microseconds. When your SLA is sub-millisecond and your hot path is called 200,000 times per second, locks are not an option. Lock-free programming replaces locks with atomic CPU instructions. The CPU handles the synchronisation at the hardware level, without OS involvement. ...

January 17, 2013 · 5 min · MW

Mechanical Sympathy: Writing Java That Respects the Hardware

Martin Thompson coined the term “mechanical sympathy” — the idea that to write fast software you need to understand the machine it runs on. Not at the assembly level necessarily, but well enough to reason about what the CPU, memory hierarchy, and OS are actually doing with your code. This post is what that looks like in practice, writing Java for a system where microseconds matter. ...

December 4, 2012 · 4 min · MW

Stop-the-World GC Pauses Killed Our SLA — And What We Did About It

The incident happened at 08:31 on a Tuesday — Frankfurt open, high volatility session. Our tick-to-quote latency spiked to 340ms for about 2 seconds. The SLA was 1ms at p99. Trading desk noticed before our monitoring did. The culprit: a full GC triggered by a promotion failure. We had 12GB heap, CMS collector, and no one had looked at GC logs since the initial deployment. ...

November 13, 2012 · 3 min · MW

Latency vs Throughput: The False Dichotomy I Learned the Hard Way

In my first performance review at the trading firm, I described a component I’d optimised as “high throughput.” My manager asked what the p99 latency was. I didn’t know. He asked what happened to latency during peak throughput. I didn’t know that either. The conversation went downhill from there. That exchange forced me to be precise about what I was actually optimising for — and why throughput and latency, while related, are fundamentally different properties. ...

September 25, 2012 · 5 min · MW

FIX Protocol 101: What Every Finance Engineer Must Know

FIX (Financial Information eXchange) protocol was designed in 1992 and looks like it. Tag-value pairs delimited by the ASCII SOH character (0x01). Fields in a prescribed order that parsers nonetheless receive out of order. A specification document larger than most novels. And yet: FIX is the universal language of electronic trading. Every exchange, every prime broker, every ECN speaks it. You will encounter it. ...

August 7, 2012 · 5 min · MW

Why Your Java App Is Slow Before It Even Starts: Classloading Deep Dive

The service started in 3 seconds in development, but the first live trade after deployment took 800ms instead of the expected sub-10ms. The second trade was fine. We couldn’t reproduce it in load tests. The culprit was classloading. The trade execution path touched 47 classes that had never been loaded before. Loading them, verifying the bytecode, and running static initialisers took 800ms — once, at first use. Understanding exactly how that happens is worth the time. ...

June 18, 2012 · 5 min · MW

Building a Price Feed Aggregator in Java: First Attempt

Three months into the job, I was given my first substantial project: build a component that subscribes to price feeds from five external venues, aggregates them into a single best-bid-offer (BBO) view per currency pair, and distributes that view to internal consumers. The spec was one page. The first implementation took two weeks. The rewrite after I measured it took another two weeks and was 40× faster. ...

May 9, 2012 · 6 min · MW

From Zero to Production: My First Month in Electronic Trading

I joined the trading desk on a Monday. By Friday I had broken the USD/JPY price feed. Not catastrophically — it was a staging environment and the feed recovered in seconds — but the experience of watching a real-time market data stream go silent because of my code was unlike anything I’d encountered at university. It crystallised something immediately: in this environment, software failures have a price tag attached. ...

February 8, 2012 · 3 min · MW
Available for consulting Distributed systems · Low-latency architecture · Go · LLM integration & RAG · Technical leadership
hello@turboawesome.win