Project Loom Preview: Virtual Threads and What They Mean for Server Code

Java’s threading model has a fundamental scalability problem: OS threads are expensive. Creating thousands of them consumes gigabytes of stack memory and causes significant scheduling overhead. This is why reactive programming (Netty, Project Reactor, RxJava) became popular — it avoids the thread-per-request model by using event loops and async callbacks. Project Loom, announced in 2017 with early previews arriving in 2018, proposed a different solution: make threads cheap. Virtual threads — JVM-managed threads that are not 1:1 with OS threads — could make the thread-per-request model scalable again. ...

May 24, 2018 · 5 min · MW

From Java 8 to Java 11 in a Regulated Environment: What Actually Broke

Java 11 was the first long-term support release after Java 8. Oracle’s announcement that commercial Java 8 support would end pushed the bank’s architecture committee to approve a migration. In theory: update the JDK, update the build files, done. In practice: six months of discovery. This is a frank account of what broke. ...

November 8, 2017 · 5 min · MW

Reading GC Logs Like a Detective

GC logs are always-on, low-overhead diagnostic data that the JVM will produce for you. They tell you the timing, cause, duration, and effect of every collection — if you know how to read them. Most Java engineers can tell you what GC does. Far fewer can look at a GC log and immediately see why the p99 latency spiked at 14:37 last Tuesday. ...

April 18, 2017 · 6 min · MW

Heap Dumps and Flight Recorder: Diagnosing JVM Memory Problems in Production

At the large financial institution where I worked from 2016, the JVM services were larger and longer-running than anything I’d dealt with in the previous role. Old generation sizes in the hundreds of gigabytes. Services running for months between restarts. Memory problems that took days or weeks to manifest. The debugging approach that worked in trading — small heaps, frequent restarts, aggressive allocation control — didn’t apply here. You had to diagnose production JVM state without stopping it. ...

August 24, 2016 · 6 min · MW

Why the Risk Team Chose Clojure (And Why It Made Sense)

The first time someone told me the risk calculation system was written in Clojure, I assumed it was a prototype or a skunkworks project. It wasn’t. It processed end-of-day risk for a significant portion of the firm’s trading book and had been in production for two years. Here’s why that decision made sense, and what it was actually like to work in. ...

April 5, 2016 · 4 min · MW

Five Years in High-Frequency Trading: What I Actually Learned

Five years ago I joined the electronic trading firm not knowing what a cache line was. I thought garbage collection was something that happened to other people’s code. I had never looked at assembly output from a Java program. I’d heard of the LMAX Disruptor but had no idea why it existed. By the time I left, I had opinions about CPU prefetchers. I had read the Intel 64 and IA-32 Architectures Software Developer’s Manual for fun. I could look at a flame graph and immediately see the GC pressure. I had shipped components processing a million messages per second with sub-millisecond p99 guarantees. Here’s what that environment actually teaches you. ...

November 12, 2015 · 6 min · MW

ZGC and Shenandoah: What Low-Pause GC Means for Trading Systems

In 2015, the state of GC for latency-sensitive Java was: use G1GC, tune it carefully, accept occasional 50–200ms pauses on large heaps, and work around them with off-heap storage and careful allocation management. The conventional wisdom was that sub-10ms GC pauses required small heaps (< 4GB) or near-zero allocation on the hot path. For trading systems with large position caches, this meant either expensive off-heap engineering or living with GC latency spikes. Then the early previews of ZGC (Oracle/Sun) and Shenandoah (Red Hat) started circulating. Both claimed sub-millisecond pause times regardless of heap size. The mechanisms were different, the implications were significant. ...

October 1, 2015 · 5 min · MW

Scala Akka Actors for Trading Workflows: Promises and Pitfalls

The case for Akka actors in a trading system sounds compelling: isolated mutable state (no shared memory, no locks), message-driven concurrency, built-in supervision hierarchies for fault tolerance, and location transparency for distributed deployments. We used Akka for the order lifecycle workflow layer — the component that orchestrated the state machine from order received to fill confirmed. Here’s what we learned. ...

August 19, 2015 · 4 min · MW

Understanding Safepoints: The JVM Pauses Nobody Talks About

We’d tuned GC to near-perfection. Pause times were sub-millisecond. The p99.9 latency was still spiking to 8ms several times a day, with no GC events anywhere near those spikes. It took three weeks to find the cause: safepoints, specifically a revoke-bias operation triggered by lock patterns in a third-party library. ...

May 27, 2015 · 4 min · MW

Choosing a GC Collector for Low-Latency Java: A Practical Comparison

By mid-2014 we had run CMS, G1, and Parallel GC in production on the same workload, and had evaluated Azul Zing. Here’s what we found and the decision framework that came out of it. ...

August 6, 2014 · 5 min · MW
Available for consulting Distributed systems · Low-latency architecture · Go · LLM integration & RAG · Technical leadership
hello@turboawesome.win