Busy Spinning vs Blocking: Thread Strategies for Ultra-Low Latency

When a thread is waiting for work — a new event, a lock to release, a signal — it has two options. It can block (tell the OS “wake me up when there’s work”) or busy-spin (loop checking a condition, never yielding the CPU). Both are correct. They have very different performance profiles.

What Happens When a Thread Blocks

Blocking in Java happens via Object.wait(), LockSupport.park(), BlockingQueue.take(), or any blocking I/O call. The OS kernel:

Takes the thread off the run queue
Context-switches to another thread (cost: ~1–5µs, varies by CPU and kernel)
When the wake condition is met, puts the thread back on the run queue
Schedules it to run (additional delay: up to one scheduler tick, typically 1ms on Linux with HZ=1000)

The critical problem: step 4. The thread doesn’t run the instant it’s unparked — it enters the run queue and waits for its turn. On a loaded system, this delay is unpredictable and can be 100µs or more.

Timeline (blocking wake-up):

Event occurs    Thread unparked    Scheduler picks thread    Thread runs
     │                │                    │                     │
─────┼────────────────┼────────────────────┼─────────────────────┼──────▶ time
     │                │                    │                     │
     └──────event─────┴─────run queue──────┴──scheduler delay────┘
             latency        latency             latency
           (1–5µs)        (0–1000µs)         (0–100µs)

The total wake-up latency is non-deterministic and heavy-tailed. For sub-millisecond SLAs, this is often unacceptable.

What Happens When a Thread Busy-Spins

A busy-spinning thread never yields the CPU. It continuously polls a condition:

1
2
3
4
5
// Busy-spin wait strategy (simplified from Disruptor)
while (cursor.get() < expectedSequence) {
    // tight loop — CPU 100% on this thread
    Thread.onSpinWait(); // JDK 9+ hint to the CPU (PAUSE instruction on x86)
}

Thread.onSpinWait() emits the x86 PAUSE instruction, which:

Signals to the CPU that this is a spin-wait loop (allows hyper-thread to get more cycles)
Prevents memory order violations that can cause pipeline flushes in tight CAS loops
Reduces power consumption slightly vs a pure spin

The wake-up latency: near zero. The thread is already running; as soon as the condition is true, it proceeds. No OS involvement, no scheduler delay.

The cost: the CPU core is fully consumed by this thread even while waiting. You are paying for a core just to sit in a loop.

The Comparison

Strategy	Wake-up latency	CPU cost while waiting	Jitter
`BlockingWaitStrategy`	50µs–1ms+	~0%	High
`SleepingWaitStrategy`	1µs–100µs	Low (yields intermittently)	Medium
`YieldingWaitStrategy`	0–10µs	Medium (calls `Thread.yield()`)	Low-medium
`BusySpinWaitStrategy`	<1µs	100% one core	Very low

The Disruptor exposes these as pluggable wait strategies precisely because the right choice depends on your latency/resource tradeoff.

The `Thread.yield()` Middle Ground

Thread.yield() asks the OS to reschedule the current thread, letting other threads run. If no other runnable threads exist, it returns immediately. If there are other threads waiting, it might wait a scheduler tick.

YieldingWaitStrategy in the Disruptor: spin for 100 iterations, then yield. The spin covers the sub-microsecond case; the yield lets other threads run when there’s nothing to do.

This works well when:

You have multiple Disruptor event processors on the same machine
The processors’ loads are balanced (some busy, some idle at any given moment)
You can tolerate occasional 10µs latency spikes

It works poorly when:

Other processes on the same machine are competing for CPU
The latency spikes from yield are unacceptable

Thread Affinity and Isolation

Busy-spinning’s value is maximised when combined with:

CPU affinity: pin the spinning thread to a specific core with taskset (Linux) or Thread Affinity library (Java). The thread is never migrated; the data it accesses is always warm in that core’s cache.

CPU isolation: remove the core from the OS scheduler with isolcpus=N in the kernel boot parameters. The OS scheduler will not run other processes on core N; only explicitly pinned threads run there.

# /etc/default/grub — add isolcpus to GRUB_CMDLINE_LINUX
GRUB_CMDLINE_LINUX="isolcpus=2,3,4,5"

# Verify (core 2-5 should have 0 scheduled processes)
$ cat /sys/devices/system/cpu/cpu2/topology/core_id

With isolated cores, a busy-spinning thread on core 2 will not be preempted by anything — no OS timer interrupt, no kernel thread, no other user process. The latency profile becomes extremely consistent.

We measured the effect: with no isolation, our aggregator thread’s p999 latency was 420µs. With CPU isolation and affinity on the same hardware with the same code, it dropped to 78µs. The difference was entirely OS scheduler interference.

When to Block, When to Spin

A decision guide:

Latency target < 100µs?
  Yes → busy spin + CPU affinity
  No  → blocking or yielding

Spare CPU cores available for dedicated threads?
  Yes → busy spin is viable
  No  → blocking (can't afford to waste cores)

Predictable latency more important than average latency?
  Yes → busy spin (eliminates scheduler jitter)
  No  → blocking or yielding (better average, worse tail)

Is this a short-lived wait (event expected in < 1µs)?
  Yes → spin (context switch would cost more than the wait)
  No  → depends on target latency

For our feed handler and aggregator threads: busy-spin on isolated cores. For client distribution threads: yielding strategy (there were too many to isolate individually, and distribution latency tolerance was higher). For operations and monitoring threads: blocking (they were never on the critical path).

The mix is intentional. Not every thread in the system needs the same latency properties. Identifying which threads are on the critical path and applying the right wait strategy per thread is better than applying the lowest-latency strategy uniformly and wasting hardware resources.

What Happens When a Thread Blocks#

What Happens When a Thread Busy-Spins#

The Comparison#

The Thread.yield() Middle Ground#

Thread Affinity and Isolation#

When to Block, When to Spin#