Memory-Mapped Files in Java: Chronicle and the Art of Zero-Copy I/O

Disk I/O and latency-sensitive systems don’t mix well. Or so the conventional wisdom goes. Memory-mapped files challenge that assumption: when the OS maps a file into virtual memory, writes and reads go through the OS page cache. If the working set fits in RAM (which it usually does for recent data), access times are in the hundreds of nanoseconds — comparable to normal memory access, not disk I/O.

Chronicle Queue is built entirely on this foundation. Understanding what’s happening underneath explains both why it’s fast and what its failure modes are.

How Memory Mapping Works

Normal file I/O:

Application → write(fd, buffer, n) → kernel copies to page cache → eventually flushes to disk
Application ← read(fd, buffer, n) ← kernel copies from page cache to user buffer

Two copies, two system calls per I/O operation. For high-frequency operations, the system call overhead alone is significant (300–1000ns per syscall on Linux).

Memory-mapped I/O:

Application maps file → OS creates virtual address range backed by file
Application writes to address → OS page cache is written directly
No system call in the write path
OS flushes modified pages to disk asynchronously

After the initial mmap() system call, reads and writes to the mapped region behave like regular memory accesses — they go through the CPU’s load/store instructions, not through read()/write() syscalls. The OS page cache acts as the backing store.

In Java:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
RandomAccessFile raf = new RandomAccessFile("prices.dat", "rw");
FileChannel channel = raf.getChannel();

// Map the first 1GB of the file into virtual memory:
MappedByteBuffer buffer = channel.map(
    FileChannel.MapMode.READ_WRITE,
    0,                    // file offset
    1024 * 1024 * 1024L   // length
);

// Write — no system call, no copy:
buffer.putLong(offset, sequence);
buffer.putDouble(offset + 8, bid);
buffer.putDouble(offset + 16, ask);

// Read — no system call, no copy:
long seq = buffer.getLong(offset);

Page Faults: The Hidden Cost

The first access to each 4KB page in the mapped region triggers a page fault: the OS must either load the page from disk (cold fault) or just map the virtual address to existing RAM (warm fault, much cheaper).

First write to a new page:
  Process → write to address → page fault → OS loads/creates page → write succeeds
  Cost: 1–10ms for a cold disk read, 1–10µs for a warm fault

Subsequent writes to same page:
  Process → write to address → cache line write
  Cost: ~5ns (L1/L2 cache hit)

For Chronicle Queue, pages are pre-touched at startup:

1
2
3
4
5
6
// Force all pages into the OS page cache (pre-fault):
MappedByteBuffer buf = channel.map(...);
long size = buf.capacity();
for (long offset = 0; offset < size; offset += 4096) {
    buf.get((int) offset);  // touches each page
}

After pre-touching, writes don’t trigger page faults. The latency is predictable.

Chronicle Queue: A Memory-Mapped Log

Chronicle Queue uses memory-mapped files to implement a persistent ordered log. The structure:

data-dir/
  yyyyMMdd.cq4          ← one file per day
  metadata.cq4t         ← index of entry positions

Each .cq4 file is a memory-mapped region containing:

[Length header (4 bytes)][Entry data (variable)][Length header]...

Appending a message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
try (ChronicleQueue queue = SingleChronicleQueueBuilder
        .binary("/data/prices")
        .rollCycle(RollCycles.DAILY)
        .build();
     ExcerptAppender appender = queue.acquireAppender()) {

    try (DocumentContext dc = appender.writingDocument()) {
        Wire wire = dc.wire();
        wire.write("symbol").text("EUR/USD");
        wire.write("bid").float64(1.28443);
        wire.write("ask").float64(1.28453);
        wire.write("time").int64(System.nanoTime());
    }  // commit on close
}

The writingDocument() returns a context with a pre-allocated write position in the mapped file. The wire.write() calls write directly to the mapped memory. The close() commits the length header atomically, making the entry visible to readers.

Zero-copy, zero system call in the append path (after initial setup).

Reading:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
try (ExcerptTailer tailer = queue.createTailer("consumer-1")) {
    while (true) {
        try (DocumentContext dc = tailer.readingDocument()) {
            if (!dc.isPresent()) {
                // Nothing to read — spin or yield
                continue;
            }
            Wire wire = dc.wire();
            String symbol = wire.read("symbol").text();
            double bid = wire.read("bid").float64();
            double ask = wire.read("ask").float64();
            processPrice(symbol, bid, ask);
        }
    }
}

The reader maintains its own position (saved to disk in metadata.cq4t). Multiple readers can independently tail the same queue at different positions — like Kafka consumers, but with sub-microsecond latency instead of milliseconds.

Throughput Numbers

On commodity hardware, a Chronicle Queue append benchmark:

Single-threaded append throughput:   8–12M messages/second
Append latency p50:                  ~50ns
Append latency p99:                  ~200ns
Append latency p99.9:                ~600ns

Compare to Kafka:

Kafka producer throughput:            500K–1M messages/second
Produce latency p50:                  1–5ms (acks=1)
Produce latency p99:                  20–50ms

Chronicle Queue is 10–20× faster and 100× lower latency than Kafka for local-process appends. The trade-off: Kafka is distributed, replicated, and available across network; Chronicle Queue is single-host (though Chronicle Map supports shared-memory IPC between processes on the same host).

When Files Don’t Fit in RAM

The OS page cache is limited to available RAM. When the mapped file is larger than RAM (a multi-day Chronicle Queue during period analysis), the OS evicts old pages to make room for new ones. Reads to evicted pages trigger disk I/O.

OS Page Cache behaviour:
  Recent pages:   in RAM (fast, <1µs)
  Older pages:    evicted to disk (slow, 1–10ms on first access)

For the trading use case — reading recent price history — this is fine. Today’s data is in the page cache; last week’s data might require a disk read. For analysis that reads arbitrary historical pages, this can cause latency spikes.

The mitigation: schedule historical analysis (backtesting, reporting) during low-latency hours, or on separate hardware, so that page cache pressure doesn’t affect the live trading path.

Memory-mapped files are one of the most underappreciated performance tools in the Linux/JVM ecosystem. When the data fits in RAM and you need sequential I/O with predictable latency, they’re consistently faster than any explicit buffering approach and simpler to reason about than complex async I/O chains.

How Memory Mapping Works#

Page Faults: The Hidden Cost#

Chronicle Queue: A Memory-Mapped Log#

Throughput Numbers#

When Files Don’t Fit in RAM#

How Memory Mapping Works

Page Faults: The Hidden Cost

Chronicle Queue: A Memory-Mapped Log

Throughput Numbers

When Files Don’t Fit in RAM