Memory Layout in Go: Structs, Alignment, and Cache Performance

This is the JVM false-sharing problem in a different language. The rules differ slightly, the tooling differs, but the underlying hardware constraint — cache lines are 64 bytes and sharing one across goroutines is expensive — is identical. ...

August 17, 2022 · 5 min · MW

Heap Dumps and Flight Recorder: Diagnosing JVM Memory Problems in Production

At the large financial institution where I worked from 2016, the JVM services were larger and longer-running than anything I’d dealt with in the previous role. Old generation sizes in the hundreds of gigabytes. Services running for months between restarts. Memory problems that took days or weeks to manifest. The debugging approach that worked in trading — small heaps, frequent restarts, aggressive allocation control — didn’t apply here. You had to diagnose production JVM state without stopping it. ...

August 24, 2016 · 6 min · MW

Memory-Mapped Files in Java: Chronicle and the Art of Zero-Copy I/O

Disk I/O and latency-sensitive systems don’t mix well. Or so the conventional wisdom goes. Memory-mapped files challenge that assumption: when the OS maps a file into virtual memory, writes and reads go through the OS page cache. If the working set fits in RAM (which it usually does for recent data), access times are in the hundreds of nanoseconds — comparable to normal memory access, not disk I/O. Chronicle Queue is built entirely on this foundation. Understanding what’s happening underneath explains both why it’s fast and what its failure modes are. ...

April 15, 2015 · 5 min · MW

Off-Heap Memory in Java: sun.misc.Unsafe and Chronicle Map

One of the FX trading desks kept a reference data structure in memory: all live position limits and risk parameters for every currency pair, updated in real time from a risk management system. The structure held about 40GB of data and was read by the trading engine on every price update. Putting 40GB on the Java heap was not an option. GC pauses on a 40GB heap are seconds, not milliseconds. The solution was off-heap allocation — memory that exists outside the GC’s visibility, managed explicitly by the application. ...

April 2, 2014 · 6 min · MW

Disruptor Deep Dive: Memory Layout, Cache Lines, and False Sharing

The Disruptor’s performance isn’t magic. It’s the consequence of a set of deliberate memory layout decisions, each targeting a specific cache coherency problem. This post goes through those decisions one by one. ...

April 9, 2013 · 5 min · MW
Available for consulting Distributed systems · Low-latency architecture · Go · LLM integration & RAG · Technical leadership
hello@turboawesome.win