Concurrency

Scala Akka Actors for Trading Workflows: Promises and Pitfalls

The case for Akka actors in a trading system sounds compelling: isolated mutable state (no shared memory, no locks), message-driven concurrency, built-in supervision hierarchies for fault tolerance, and location transparency for distributed deployments. We used Akka for the order lifecycle workflow layer — the component that orchestrated the state machine from order received to fill confirmed. Here’s what we learned. ...

Building a Trade Blotter That Doesn't Lie Under Load

The trade blotter is the dashboard every trader stares at during the day: live positions, recent fills, P&L, risk utilisation. It’s the interface between the trading system and the humans who run it. The blotter has an interesting engineering property: it’s not in the critical path (orders execute without waiting for the blotter to update), but it has to be consistently correct and fast-updating, because traders make decisions based on what it shows. A blotter that shows stale positions leads to over-trading; one that misses fills leads to missed hedges. ...

Busy Spinning vs Blocking: Thread Strategies for Ultra-Low Latency

When a thread is waiting for work — a new event, a lock to release, a signal — it has two options. It can block (tell the OS “wake me up when there’s work”) or busy-spin (loop checking a condition, never yielding the CPU). Both are correct. They have very different performance profiles. ...

Comparing ArrayBlockingQueue to the Disruptor: Numbers Don't Lie

After writing about the Disruptor’s design, the obvious question is: how much faster is it, really? “Faster” is not a useful answer. Let’s look at actual numbers under controlled conditions. This is a benchmarking exercise, not a recommendation. The right data structure depends on your use case. The goal here is to understand the performance characteristics of each under different contention patterns. ...

The LMAX Disruptor: How a Ring Buffer Changed My Mental Model of Queues

In mid-2013 we replaced our internal LinkedBlockingQueue-based event bus with the LMAX Disruptor. Median latency dropped by 30%. The 99th percentile dropped by more than half. The change touched about 400 lines of code. This post is about the conceptual model you need to understand why the Disruptor is fast — not just “it uses a ring buffer,” but what that actually means for your hardware. ...

Introduction to Lock-Free Programming in Java

Locks work. synchronized in Java is correct, well-understood, and wrong for our use case. A lock that’s contested causes a thread to block — the OS parks it, context-switches to something else, and eventually context-switches back. Each of those transitions costs microseconds. When your SLA is sub-millisecond and your hot path is called 200,000 times per second, locks are not an option. Lock-free programming replaces locks with atomic CPU instructions. The CPU handles the synchronisation at the hardware level, without OS involvement. ...

Building a Price Feed Aggregator in Java: First Attempt

Three months into the job, I was given my first substantial project: build a component that subscribes to price feeds from five external venues, aggregates them into a single best-bid-offer (BBO) view per currency pair, and distributes that view to internal consumers. The spec was one page. The first implementation took two weeks. The rewrite after I measured it took another two weeks and was 40× faster. ...