Java’s threading model has a fundamental scalability problem: OS threads are expensive. Creating thousands of them consumes gigabytes of stack memory and causes significant scheduling overhead. This is why reactive programming (Netty, Project Reactor, RxJava) became popular — it avoids the thread-per-request model by using event loops and async callbacks.
Project Loom, announced in 2017 with early previews arriving in 2018, proposed a different solution: make threads cheap. Virtual threads — JVM-managed threads that are not 1:1 with OS threads — could make the thread-per-request model scalable again.
The Problem with OS Threads
A Java OS thread (java.lang.Thread) maps to a native OS thread. Each has:
- A kernel scheduling structure (~64KB)
- A default stack of 512KB–1MB (configurable)
- Context-switch overhead when the scheduler switches between them
For a web server handling 10,000 concurrent requests with thread-per-request, you need 10,000 OS threads — 5–10GB of stack memory and constant kernel scheduling overhead. This is why typical server deployments cap thread pool sizes at 200–500 threads and rely on connection queuing.
The workaround — reactive/async programming — solves the thread count problem but introduces complexity:
| |
The reactive version is harder to write, harder to debug (stack traces show dispatcher frames, not your code), and harder for new engineers to understand. Project Loom proposed keeping the second style while achieving the first’s scalability.
Virtual Threads: The Loom Approach
A virtual thread is a java.lang.Thread that the JVM schedules onto a pool of carrier OS threads. When a virtual thread blocks (waiting for I/O, sleeping, acquiring a lock), it is unmounted from its carrier — the carrier OS thread picks up another virtual thread:
Virtual threads V1...V10000
↓
JVM Scheduler
↓
Carrier threads (= # CPUs, e.g., 8)
↓
OS threads 1-8
10,000 virtual threads run on 8 OS threads. When V1 blocks on an HTTP call, its state is saved to heap (not stack!) and V2 runs on that carrier thread instead.
Stack size: virtual thread stacks start at a few hundred bytes and grow as needed. 10,000 virtual threads might use 100MB total stack space, compared to 5–10GB for 10,000 OS threads.
The early API:
| |
Each task gets its own virtual thread. The HTTP call blocks the virtual thread but not the OS carrier thread. All 10,000 tasks make progress concurrently without 10,000 OS threads.
What Doesn’t Change
CPU-bound work: virtual threads don’t help with CPU-bound parallelism. You still need OS threads for that, and you still have GOMAXPROCS-equivalent constraints. Virtual threads help with I/O-bound concurrency.
Synchronisation: virtual threads use synchronized and ReentrantLock. However, synchronized blocks (not Lock) pin the virtual thread to its carrier — the carrier OS thread is held even when the virtual thread is blocked inside a synchronized block. This means code with heavy synchronized use doesn’t benefit as much from virtual threads. The Loom team worked on eliminating this pinning, but it remained a caveat in early previews.
Memory per virtual thread: while stack is small, each virtual thread still has object overhead (~200 bytes per virtual thread for the thread object and a small initial stack segment). 1 million virtual threads ≈ 200MB. Fine for most use cases; not free.
Why This Matters for the Threading Model Decision
Recall the threading model comparison from earlier: thread-per-request vs. event loop vs. work-stealing. Virtual threads effectively make thread-per-request competitive with event loops for I/O-bound workloads:
Model I/O concurrency Latency Code complexity
───────────────────────────────────────────────────────────────────────
OS thread-per-request Low (500 max) Medium Simple
Virtual thread-per-req High (millions) Medium Simple
Event loop (Netty) High Low High
Work-stealing (FJP) Medium Medium Medium
For the large class of services that are I/O-bound (database calls, HTTP calls to other services), virtual threads eliminate the main reason to choose reactive programming. You get the concurrency of event loops with the readability of blocking code.
Reactive programming retains advantages for truly latency-critical paths (sub-millisecond response requirements) and for complex async pipelines with rich backpressure semantics. But for the average CRUD API or workflow service, virtual threads are a significant simplification.
The Trajectory
Loom entered preview in Java 19 (2022), and virtual threads were stabilised as a production feature in Java 21 LTS (2023). By 2024, Spring Boot and most major Java frameworks had virtual thread support built in.
The effect on the broader Java ecosystem: frameworks that were built around reactive programming (Spring WebFlux, Quarkus reactive, Vert.x) now had to justify their complexity against a simpler alternative. The answer — that virtual threads are excellent for I/O-bound blocking code but reactive remains better for complex async pipelines and very low latency — is nuanced but correct.
For server-side Java code written in 2024+: default to virtual threads for new services with I/O-bound concurrency. The thread-per-request model with Executors.newVirtualThreadPerTaskExecutor() is idiomatic, simple, and scalable.