Off-Heap Memory in Java: sun.misc.Unsafe and Chronicle Map

One of the FX trading desks kept a reference data structure in memory: all live position limits and risk parameters for every currency pair, updated in real time from a risk management system. The structure held about 40GB of data and was read by the trading engine on every price update. Putting 40GB on the Java heap was not an option. GC pauses on a 40GB heap are seconds, not milliseconds. The solution was off-heap allocation — memory that exists outside the GC’s visibility, managed explicitly by the application. ...

April 2, 2014 · 6 min · MW

Scala on the Hot Path: Where the Abstraction Cost Goes

We were using Scala in a few non-critical components at the trading firm — utility code, configuration, some tooling. Then someone proposed moving a market data normalisation component to Scala. The component processed 800,000 messages/second and had a 500µs latency budget per message at p99. The discussion that followed taught me more about the JVM than a year of reading. ...

January 8, 2014 · 6 min · MW

Why Average Latency Is a Lie: HdrHistogram and Measuring What Matters

If someone tells you their system has 2ms average latency, they’ve told you almost nothing useful. A system that delivers 1ms 99% of the time and 100ms 1% of the time has 2ms average latency. So does a system that delivers 2ms every single time. These behave completely differently in production. The problem isn’t measurement frequency — it’s that averages destroy the distribution. ...

November 27, 2013 · 5 min · MW

JVM JIT Compilation: What the C2 Compiler Does to Your Loops

Java’s “write once, run anywhere” promise is kept by the JVM. Its performance is kept by the JIT compiler. The gap between “Java is slow” (the 1998 opinion) and “Java is competitive with C++ for many workloads” (the 2013 reality, and more so now) is almost entirely the C2 compiler. Understanding what C2 does — and when it stops doing it — matters if you’re writing performance-sensitive Java. ...

July 30, 2013 · 6 min · MW

Stop-the-World GC Pauses Killed Our SLA — And What We Did About It

The incident happened at 08:31 on a Tuesday — Frankfurt open, high volatility session. Our tick-to-quote latency spiked to 340ms for about 2 seconds. The SLA was 1ms at p99. Trading desk noticed before our monitoring did. The culprit: a full GC triggered by a promotion failure. We had 12GB heap, CMS collector, and no one had looked at GC logs since the initial deployment. ...

November 13, 2012 · 3 min · MW

Why Your Java App Is Slow Before It Even Starts: Classloading Deep Dive

The service started in 3 seconds in development, but the first live trade after deployment took 800ms instead of the expected sub-10ms. The second trade was fine. We couldn’t reproduce it in load tests. The culprit was classloading. The trade execution path touched 47 classes that had never been loaded before. Loading them, verifying the bytecode, and running static initialisers took 800ms — once, at first use. Understanding exactly how that happens is worth the time. ...

June 18, 2012 · 5 min · MW
Available for consulting Distributed systems · Low-latency architecture · Go · LLM integration & RAG · Technical leadership
hello@turboawesome.win