Comparing ArrayBlockingQueue to the Disruptor: Numbers Don't Lie

After writing about the Disruptor’s design, the obvious question is: how much faster is it, really? “Faster” is not a useful answer. Let’s look at actual numbers under controlled conditions.

This is a benchmarking exercise, not a recommendation. The right data structure depends on your use case. The goal here is to understand the performance characteristics of each under different contention patterns.

The Test Setup

All benchmarks run on a dedicated box: Intel Core i7-3770 (4 cores, 8 threads), 16GB DDR3-1600, Ubuntu 12.04, Java 7u21. JMH 0.5 (pre-release at the time, but the methodology is sound). CPU governor set to performance, isolcpus=2,3 for test threads.

Benchmark: one producer writes a long sequence number. One consumer reads it and verifies the sequence. We measure throughput (ops/sec) and latency (ns/op) under:

1 producer → 1 consumer
3 producers → 1 consumer
1 producer → 3 consumers

Queue size: 65,536 (power of 2 — required for Disruptor, chosen for ABQ too for fairness).

1P → 1C: Single Producer, Single Consumer

                   Throughput     Latency p50    Latency p99
───────────────────────────────────────────────────────────
ArrayBlockingQueue   4.8M ops/s    185 ns         1,400 ns
Disruptor (lock)     8.2M ops/s     98 ns           420 ns
Disruptor (wait)    24.1M ops/s     38 ns            62 ns
Disruptor (spin)    51.7M ops/s     18 ns            24 ns

Disruptor variants:

lock: uses a lock-based strategy (comparable to ABQ approach)
wait: YieldingWaitStrategy — yields to the OS scheduler after a spin count
spin: BusySpinWaitStrategy — pure spin, consumes a full CPU core

At 1P/1C, even the Disruptor with yielding is 5× faster than ABQ. With busy-spin, 10×. The gain comes from:

No lock acquisition on the fast path (single producer = no CAS needed)
No condition variable signalling (spin instead of park/unpark)
Better cache layout — the ring buffer’s sequence is a single cache-line-aligned long

3P → 1C: Multiple Producers, Single Consumer

                   Throughput     Latency p50    Latency p99
───────────────────────────────────────────────────────────
ArrayBlockingQueue   3.1M ops/s    310 ns         4,200 ns
Disruptor (MP)      14.8M ops/s     61 ns           190 ns

ABQ degrades more under producer contention than the Disruptor. Why:

ABQ uses a single ReentrantLock protecting both the head and tail. Three producers all contend on the same lock — only one can publish at a time.
The Disruptor’s multi-producer sequencer uses a CAS on the claim sequence. CAS under contention degrades, but faster than a mutex because there’s no kernel transition on the uncontended path.

The p99 is the telling number: 4.2µs for ABQ vs. 190ns for Disruptor. That’s the lock contention tail — when all three producers arrive simultaneously, two wait while one publishes, and the wait involves a context switch.

1P → 3C: Single Producer, Multiple Consumers

This is the fan-out pattern — one event published to three independent consumers in parallel.

                   Throughput     Latency (producer to all consumers done)
──────────────────────────────────────────────────────────────────────────
3× ABQ              1.6M ops/s    580 ns
Disruptor barrier   22.3M ops/s    43 ns

The “3× ABQ” approach means the producer puts the event into three separate queues. Three consumers each drain their own queue. This requires copying the reference three times and synchronising on three queues.

The Disruptor’s SequenceBarrier allows multiple consumers to each independently track their read position on the same ring buffer. The producer publishes once; each consumer reads independently. No copying, no duplication.

The throughput difference here (14×) is dominated by:

One publish vs. three synchronized puts
One ring buffer (stays in cache) vs. three ring buffers (evict each other)

Where ABQ Wins

This benchmarking exercise would be dishonest without noting where ABQ is the right choice:

Simplicity: ABQ is two lines to set up. The Disruptor requires a RingBuffer, a WaitStrategy, at least one EventFactory, and a BatchEventProcessor per consumer. The setup is verbose and the error surface is larger.

Blocking is acceptable: ABQ’s blocking semantics are often exactly what you want. A background worker that processes jobs when they arrive and sleeps otherwise should use ABQ. Using a busy-spin Disruptor for a job queue that processes 10 events/minute wastes a CPU core for no reason.

Backpressure: ABQ.put() blocks the producer when full, which is correct backpressure behaviour for many use cases. The Disruptor can drop events or spin if the ring buffer fills — you need to handle this explicitly.

Short-lived queues: If the queue exists for the lifetime of a request (not a long-lived service pipeline), the setup overhead of the Disruptor is not amortised.

The Summary

Use case                              Recommendation
──────────────────────────────────────────────────────────────────
Background work queue, occasional use  ArrayBlockingQueue
High-throughput, latency-sensitive     Disruptor
Multiple producers, shared queue       Disruptor (CAS vs. lock)
Fan-out to multiple consumers          Disruptor (shared ring buffer)
Simple task scheduling                 ArrayBlockingQueue
Sub-microsecond latency required       Disruptor + busy-spin + CPU affinity

The Disruptor is not “always better.” It’s better when throughput and latency under contention are the primary constraints and when you’re willing to pay the setup complexity and CPU cost.

For the market data normalisation pipeline at the trading firm, those conditions held. For the internal audit log, they didn’t — we used ABQ and it was fine.

The Test Setup#

1P → 1C: Single Producer, Single Consumer#

3P → 1C: Multiple Producers, Single Consumer#

1P → 3C: Single Producer, Multiple Consumers#

Where ABQ Wins#

The Summary#