ZGC and Shenandoah: What Low-Pause GC Means for Trading Systems

In 2015, the state of GC for latency-sensitive Java was: use G1GC, tune it carefully, accept occasional 50–200ms pauses on large heaps, and work around them with off-heap storage and careful allocation management. The conventional wisdom was that sub-10ms GC pauses required small heaps (< 4GB) or near-zero allocation on the hot path. For trading systems with large position caches, this meant either expensive off-heap engineering or living with GC latency spikes. Then the early previews of ZGC (Oracle/Sun) and Shenandoah (Red Hat) started circulating. Both claimed sub-millisecond pause times regardless of heap size. The mechanisms were different, the implications were significant. ...

October 1, 2015 · 5 min · MW

Understanding Safepoints: The JVM Pauses Nobody Talks About

We’d tuned GC to near-perfection. Pause times were sub-millisecond. The p99.9 latency was still spiking to 8ms several times a day, with no GC events anywhere near those spikes. It took three weeks to find the cause: safepoints, specifically a revoke-bias operation triggered by lock patterns in a third-party library. ...

May 27, 2015 · 4 min · MW

Memory-Mapped Files in Java: Chronicle and the Art of Zero-Copy I/O

Disk I/O and latency-sensitive systems don’t mix well. Or so the conventional wisdom goes. Memory-mapped files challenge that assumption: when the OS maps a file into virtual memory, writes and reads go through the OS page cache. If the working set fits in RAM (which it usually does for recent data), access times are in the hundreds of nanoseconds — comparable to normal memory access, not disk I/O. Chronicle Queue is built entirely on this foundation. Understanding what’s happening underneath explains both why it’s fast and what its failure modes are. ...

April 15, 2015 · 5 min · MW

Building a Trade Blotter That Doesn't Lie Under Load

The trade blotter is the dashboard every trader stares at during the day: live positions, recent fills, P&L, risk utilisation. It’s the interface between the trading system and the humans who run it. The blotter has an interesting engineering property: it’s not in the critical path (orders execute without waiting for the blotter to update), but it has to be consistently correct and fast-updating, because traders make decisions based on what it shows. A blotter that shows stale positions leads to over-trading; one that misses fills leads to missed hedges. ...

March 4, 2015 · 5 min · MW

Chronicle Queue vs Kafka: Choosing a Persistent Journal at Nanosecond Scale

By early 2015 we had both Chronicle Queue and Kafka in production — Chronicle for intra-day trade journaling, Kafka for end-of-day data pipelines. The question came up repeatedly: why not use one for both? The answer is that they solve different problems with incompatible design priorities. ...

January 21, 2015 · 4 min · MW

End-of-Year Architecture Review: What Held, What Failed, What Changed

Three years into building trading systems, the end of 2014 felt like a good moment to stop and audit what we’d built. Not a full rewrite assessment — more a structured reflection on which bets paid off, which didn’t, and what the data was telling us about where the gaps were. This kind of review is undervalued in fast-moving engineering organisations. You learn a lot from production behaviour over years that you can’t learn from design docs. ...

December 10, 2014 · 4 min · MW

Benchmarking Without Lying: JMH, Coordinated Omission, and Honest Numbers

I spent a morning once very proud of a benchmark showing our new order-matching path had p99 latency of 180µs, down from 340µs. It was a 47% improvement. I presented it in a team meeting. An engineer asked one question: “Is that closed-loop or open-loop?” I didn’t know what that meant. The benchmark was worthless. ...

October 29, 2014 · 4 min · MW

Aeron: Reliable UDP Multicast for Market Data Distribution

Our market data distribution problem was straightforward to state and hard to solve: deliver price updates to a dozen internal consumers with sub-500µs latency at 400,000 messages/second, with no head-of-line blocking between consumers. TCP broadcasting is serial — slow consumers stall fast ones. ZeroMQ was promising but showed GC pressure from its buffer management. Kafka was built for durability, not microsecond latency. When Martin Thompson and Todd Montgomery open-sourced Aeron in 2014, it solved almost exactly this problem. ...

September 17, 2014 · 6 min · MW

Choosing a GC Collector for Low-Latency Java: A Practical Comparison

By mid-2014 we had run CMS, G1, and Parallel GC in production on the same workload, and had evaluated Azul Zing. Here’s what we found and the decision framework that came out of it. ...

August 6, 2014 · 5 min · MW

Busy Spinning vs Blocking: Thread Strategies for Ultra-Low Latency

When a thread is waiting for work — a new event, a lock to release, a signal — it has two options. It can block (tell the OS “wake me up when there’s work”) or busy-spin (loop checking a condition, never yielding the CPU). Both are correct. They have very different performance profiles. ...

May 14, 2014 · 5 min · MW
Available for consulting Distributed systems · Low-latency architecture · Go · LLM integration & RAG · Technical leadership
hello@turboawesome.win