The move from a small trading firm to a large financial institution meant working with codebases an order of magnitude larger, maintained by dozens of engineers across multiple teams. It also meant encountering the full spectrum of Java threading models in production — some appropriate, some inherited from a different era, and some that were actively causing problems.
This is a survey of what those models look like, what they’re good at, and how you tell which one a system needs.
The Models
1. Thread-Per-Request (The Classic)
Request 1 → Thread A → [process] → response
Request 2 → Thread B → [process] → response
Request 3 → Thread C → blocking on DB → ...
Thread D → [process] → response (took from pool)
Each request gets a thread from a pool. The thread handles the full request lifecycle, including blocking I/O (database calls, downstream services). When the request completes, the thread returns to the pool.
Works well for:
- Services where blocking I/O is the primary bottleneck
- Code that’s clearest written sequentially
- Services with bounded concurrency requirements (thread count limits throughput)
Breaks down when:
- Concurrency is high (thousands of simultaneous requests) — thread-per-request becomes thread-per-client, stack memory grows, context-switching overhead rises
- Blocking includes long-held locks or high-latency external calls — threads pile up waiting, thread pool fills, requests queue up
This is still the right model for most backend services. Spring MVC, JAX-RS, and most Java web frameworks default to it for good reason.
2. Single-Threaded Event Loop
I/O events → Selector (epoll/kqueue) → Event queue → Single thread → handlers
│
No blocking allowed!
Netty, Vert.x, and the Node.js model applied to Java. A single thread processes a queue of events. All handlers must be non-blocking; long operations are dispatched to worker thread pools.
Works well for:
- Network I/O bound workloads with many concurrent connections
- Protocol servers (HTTP, WebSocket, FIX)
- Proxies, gateways, fan-out services
Breaks down when:
- CPU-bound work mixes with I/O (single thread becomes bottleneck)
- A blocking call sneaks into a handler (stalls all other requests)
- Developer ergonomics matter — callback chains are harder to reason about than sequential code
In practice, Netty and Vert.x use multiple event-loop threads (one per CPU core), which is a hybrid — still non-blocking within each loop, but parallel across loops.
3. Work-Stealing (ForkJoinPool / CompletableFuture)
Task queue → Worker thread 1 → subtask → subtask
→ Worker thread 2 → (stealing from thread 1's queue)
→ Worker thread 3
→ Worker thread 4
ForkJoinPool divides work into subtasks, each of which can subdivide further. Idle threads steal tasks from busy threads’ local queues. CompletableFuture chains composition operations that execute on the common pool.
Works well for:
- CPU-bound computations that decompose naturally (tree traversal, recursive algorithms)
- Async pipelines where you want back-pressure to flow naturally
- Parallel stream processing
Breaks down when:
- Tasks block (blocking tasks prevent work-stealing, starve the pool)
- Task granularity is too fine (overhead of stealing exceeds work done)
- Exception handling across async stages gets complex
The CompletableFuture model is excellent for orchestrating async calls to downstream services:
| |
Both upstream calls run in parallel, the result is composed, and the whole chain is non-blocking from the calling thread’s perspective.
4. Actor Model (Akka / Disruptor)
Actor A → mailbox → Actor A (processes one message at a time)
Actor B → mailbox → Actor B
Actor C → mailbox → Actor C
└→ sends message to Actor D's mailbox
Each actor has a mailbox (message queue) and processes one message at a time, so no synchronisation is needed within an actor. Concurrency comes from running many actors in parallel.
Works well for:
- Systems that naturally model as independent state machines
- High-throughput pipelines where stages communicate via queues (LMAX Disruptor)
- State isolation — actors own their state, nothing shares mutable state
Breaks down when:
- Message protocol becomes complex (too many message types, unclear flows)
- Debugging requires tracing across many actor handoffs
- Backpressure needs explicit design (messages accumulate in mailboxes under load)
Comparison
| Model | Throughput | Latency | Memory | Debuggability | Blocking safe? |
|---|---|---|---|---|---|
| Thread-per-request | Medium | Medium | Higher (stacks) | Excellent | Yes |
| Event loop | High | Low | Low | Hard | No |
| Work-stealing | High | Medium | Medium | Medium | No |
| Actor/message-passing | High | Medium | Medium | Medium | Depends |
How to Choose
The questions to answer before picking a model:
1. What’s the bottleneck?
- CPU bound → work-stealing or dedicated threads with affinity
- I/O bound (many connections, low compute per request) → event loop
- I/O bound (few connections, high compute) → thread-per-request
2. How complex is the state machine?
- Sequential request/response → thread-per-request (simplest code)
- Multiple async dependencies →
CompletableFutureor reactive - Persistent state machine with events → actor model
3. What are the latency requirements?
- Sub-millisecond → dedicated threads, no shared pool, no blocking
- Single-digit milliseconds → event loop or async
- Tens of milliseconds → thread-per-request is fine
4. What’s the team’s experience? This is not a cop-out. Async code is genuinely harder to debug and reason about. If the team is not experienced with non-blocking patterns, the event-loop or actor model will produce bugs that thread-per-request wouldn’t. The throughput gain needs to justify the added cognitive load.
What I Found in Production
At the institution, the services fell into two categories:
Correctly modelled: batch processing pipelines (thread-per-request, throughput focus), protocol gateways (Netty event loop, latency focus), price calculation services (work-stealing, CPU-bound).
Incorrectly modelled: a set of services that had been “modernised” to use reactive programming (Project Reactor) without corresponding changes to the data access layer — the repositories still did blocking JDBC calls. The result was reactive pipelines that blocked the thread pool, negating the benefits of the reactive model while adding the debugging complexity. This is the worst of all worlds.
The principle that cuts through the complexity: the threading model should match where the bottleneck is, not what’s currently fashionable. Thread-per-request is not legacy code — it’s the right model for a large class of services. Don’t solve a problem you don’t have.