Performance

Reading GC Logs Like a Detective

GC logs are always-on, low-overhead diagnostic data that the JVM will produce for you. They tell you the timing, cause, duration, and effect of every collection — if you know how to read them. Most Java engineers can tell you what GC does. Far fewer can look at a GC log and immediately see why the p99 latency spiked at 14:37 last Tuesday. ...

Clojure Data Pipelines: Transducers in Production Risk Processing

The risk calculation pipeline processed end-of-day positions: take all the day’s trades, aggregate them to net positions, apply mark-to-market prices, and compute risk metrics. The input was ~800,000 trade records; the output was ~12,000 position records with P&L and Greeks. The initial implementation used standard Clojure sequence operations: 1 2 3 4 5 6 (->> trades (filter open-trade?) (map enrich-with-market-data) (group-by :currency-pair) (map-vals aggregate-position) (map-vals compute-risk-metrics)) Clean. Readable. And it created four intermediate collections of 800,000 elements each before producing the final output. That’s 3.2M intermediate objects for a 12K result. Transducers changed this. ...

Threading Models in Java: Which One Does Your System Actually Need?

The move from a small trading firm to a large financial institution meant working with codebases an order of magnitude larger, maintained by dozens of engineers across multiple teams. It also meant encountering the full spectrum of Java threading models in production — some appropriate, some inherited from a different era, and some that were actively causing problems. This is a survey of what those models look like, what they’re good at, and how you tell which one a system needs. ...

Five Years in High-Frequency Trading: What I Actually Learned

Five years ago I joined the electronic trading firm not knowing what a cache line was. I thought garbage collection was something that happened to other people’s code. I had never looked at assembly output from a Java program. I’d heard of the LMAX Disruptor but had no idea why it existed. By the time I left, I had opinions about CPU prefetchers. I had read the Intel 64 and IA-32 Architectures Software Developer’s Manual for fun. I could look at a flame graph and immediately see the GC pressure. I had shipped components processing a million messages per second with sub-millisecond p99 guarantees. Here’s what that environment actually teaches you. ...

ZGC and Shenandoah: What Low-Pause GC Means for Trading Systems

In 2015, the state of GC for latency-sensitive Java was: use G1GC, tune it carefully, accept occasional 50–200ms pauses on large heaps, and work around them with off-heap storage and careful allocation management. The conventional wisdom was that sub-10ms GC pauses required small heaps (< 4GB) or near-zero allocation on the hot path. For trading systems with large position caches, this meant either expensive off-heap engineering or living with GC latency spikes. Then the early previews of ZGC (Oracle/Sun) and Shenandoah (Red Hat) started circulating. Both claimed sub-millisecond pause times regardless of heap size. The mechanisms were different, the implications were significant. ...

Understanding Safepoints: The JVM Pauses Nobody Talks About

We’d tuned GC to near-perfection. Pause times were sub-millisecond. The p99.9 latency was still spiking to 8ms several times a day, with no GC events anywhere near those spikes. It took three weeks to find the cause: safepoints, specifically a revoke-bias operation triggered by lock patterns in a third-party library. ...

Memory-Mapped Files in Java: Chronicle and the Art of Zero-Copy I/O

Disk I/O and latency-sensitive systems don’t mix well. Or so the conventional wisdom goes. Memory-mapped files challenge that assumption: when the OS maps a file into virtual memory, writes and reads go through the OS page cache. If the working set fits in RAM (which it usually does for recent data), access times are in the hundreds of nanoseconds — comparable to normal memory access, not disk I/O. Chronicle Queue is built entirely on this foundation. Understanding what’s happening underneath explains both why it’s fast and what its failure modes are. ...

Building a Trade Blotter That Doesn't Lie Under Load

The trade blotter is the dashboard every trader stares at during the day: live positions, recent fills, P&L, risk utilisation. It’s the interface between the trading system and the humans who run it. The blotter has an interesting engineering property: it’s not in the critical path (orders execute without waiting for the blotter to update), but it has to be consistently correct and fast-updating, because traders make decisions based on what it shows. A blotter that shows stale positions leads to over-trading; one that misses fills leads to missed hedges. ...

Chronicle Queue vs Kafka: Choosing a Persistent Journal at Nanosecond Scale

By early 2015 we had both Chronicle Queue and Kafka in production — Chronicle for intra-day trade journaling, Kafka for end-of-day data pipelines. The question came up repeatedly: why not use one for both? The answer is that they solve different problems with incompatible design priorities. ...

Benchmarking Without Lying: JMH, Coordinated Omission, and Honest Numbers

I spent a morning once very proud of a benchmark showing our new order-matching path had p99 latency of 180µs, down from 340µs. It was a 47% improvement. I presented it in a team meeting. An engineer asked one question: “Is that closed-loop or open-loop?” I didn’t know what that meant. The benchmark was worthless. ...