After writing about the Disruptor’s design, the obvious question is: how much faster is it, really? “Faster” is not a useful answer. Let’s look at actual numbers under controlled conditions.
This is a benchmarking exercise, not a recommendation. The right data structure depends on your use case. The goal here is to understand the performance characteristics of each under different contention patterns.
The Test Setup
All benchmarks run on a dedicated box: Intel Core i7-3770 (4 cores, 8 threads), 16GB DDR3-1600, Ubuntu 12.04, Java 7u21. JMH 0.5 (pre-release at the time, but the methodology is sound). CPU governor set to performance, isolcpus=2,3 for test threads.
Benchmark: one producer writes a long sequence number. One consumer reads it and verifies the sequence. We measure throughput (ops/sec) and latency (ns/op) under:
- 1 producer → 1 consumer
- 3 producers → 1 consumer
- 1 producer → 3 consumers
Queue size: 65,536 (power of 2 — required for Disruptor, chosen for ABQ too for fairness).
1P → 1C: Single Producer, Single Consumer
Throughput Latency p50 Latency p99
───────────────────────────────────────────────────────────
ArrayBlockingQueue 4.8M ops/s 185 ns 1,400 ns
Disruptor (lock) 8.2M ops/s 98 ns 420 ns
Disruptor (wait) 24.1M ops/s 38 ns 62 ns
Disruptor (spin) 51.7M ops/s 18 ns 24 ns
Disruptor variants:
- lock: uses a lock-based strategy (comparable to ABQ approach)
- wait:
YieldingWaitStrategy— yields to the OS scheduler after a spin count - spin:
BusySpinWaitStrategy— pure spin, consumes a full CPU core
At 1P/1C, even the Disruptor with yielding is 5× faster than ABQ. With busy-spin, 10×. The gain comes from:
- No lock acquisition on the fast path (single producer = no CAS needed)
- No condition variable signalling (spin instead of park/unpark)
- Better cache layout — the ring buffer’s sequence is a single cache-line-aligned
long
3P → 1C: Multiple Producers, Single Consumer
Throughput Latency p50 Latency p99
───────────────────────────────────────────────────────────
ArrayBlockingQueue 3.1M ops/s 310 ns 4,200 ns
Disruptor (MP) 14.8M ops/s 61 ns 190 ns
ABQ degrades more under producer contention than the Disruptor. Why:
- ABQ uses a single
ReentrantLockprotecting both the head and tail. Three producers all contend on the same lock — only one can publish at a time. - The Disruptor’s multi-producer sequencer uses a CAS on the claim sequence. CAS under contention degrades, but faster than a mutex because there’s no kernel transition on the uncontended path.
The p99 is the telling number: 4.2µs for ABQ vs. 190ns for Disruptor. That’s the lock contention tail — when all three producers arrive simultaneously, two wait while one publishes, and the wait involves a context switch.
1P → 3C: Single Producer, Multiple Consumers
This is the fan-out pattern — one event published to three independent consumers in parallel.
Throughput Latency (producer to all consumers done)
──────────────────────────────────────────────────────────────────────────
3× ABQ 1.6M ops/s 580 ns
Disruptor barrier 22.3M ops/s 43 ns
The “3× ABQ” approach means the producer puts the event into three separate queues. Three consumers each drain their own queue. This requires copying the reference three times and synchronising on three queues.
The Disruptor’s SequenceBarrier allows multiple consumers to each independently track their read position on the same ring buffer. The producer publishes once; each consumer reads independently. No copying, no duplication.
The throughput difference here (14×) is dominated by:
- One publish vs. three synchronized puts
- One ring buffer (stays in cache) vs. three ring buffers (evict each other)
Where ABQ Wins
This benchmarking exercise would be dishonest without noting where ABQ is the right choice:
Simplicity: ABQ is two lines to set up. The Disruptor requires a RingBuffer, a WaitStrategy, at least one EventFactory, and a BatchEventProcessor per consumer. The setup is verbose and the error surface is larger.
Blocking is acceptable: ABQ’s blocking semantics are often exactly what you want. A background worker that processes jobs when they arrive and sleeps otherwise should use ABQ. Using a busy-spin Disruptor for a job queue that processes 10 events/minute wastes a CPU core for no reason.
Backpressure: ABQ.put() blocks the producer when full, which is correct backpressure behaviour for many use cases. The Disruptor can drop events or spin if the ring buffer fills — you need to handle this explicitly.
Short-lived queues: If the queue exists for the lifetime of a request (not a long-lived service pipeline), the setup overhead of the Disruptor is not amortised.
The Summary
Use case Recommendation
──────────────────────────────────────────────────────────────────
Background work queue, occasional use ArrayBlockingQueue
High-throughput, latency-sensitive Disruptor
Multiple producers, shared queue Disruptor (CAS vs. lock)
Fan-out to multiple consumers Disruptor (shared ring buffer)
Simple task scheduling ArrayBlockingQueue
Sub-microsecond latency required Disruptor + busy-spin + CPU affinity
The Disruptor is not “always better.” It’s better when throughput and latency under contention are the primary constraints and when you’re willing to pay the setup complexity and CPU cost.
For the market data normalisation pipeline at the trading firm, those conditions held. For the internal audit log, they didn’t — we used ABQ and it was fine.