Chronicle Queue vs Kafka: Choosing a Persistent Journal at Nanosecond Scale

By early 2015 we had both Chronicle Queue and Kafka in production — Chronicle for intra-day trade journaling, Kafka for end-of-day data pipelines. The question came up repeatedly: why not use one for both? The answer is that they solve different problems with incompatible design priorities.

Design Philosophy

Dimension	Chronicle Queue	Apache Kafka
Primary goal	Ultra-low latency, single machine	High-throughput, distributed, durable
Deployment	Library in your JVM process	Separate cluster (broker + ZooKeeper)
Persistence	Memory-mapped files, local disk	Replicated log, multiple brokers
Consumers	Unlimited readers, zero coordination	Consumer groups, offset tracking
Replication	None (single machine)	Configurable (min 1, default 3)
Ordering	Strict global order	Strict per-partition order
Latency	20–200ns write, <1µs read	1–10ms end-to-end (acks=all)
Throughput	~100M msgs/s (local memory)	~1M msgs/s (network-bound)
Operational cost	Zero (embedded)	High (cluster management)

These are not competing products in the same category. They’re tools with overlapping capabilities and very different tradeoffs.

Chronicle Queue: The Local Fast Lane

Chronicle Queue is an embedded library — no network, no broker, no separate process. You write to it and read from it within the same machine (or across machines via shared NFS/NAS, though that’s uncommon).

The write path:

appender.writeDocument(w ->
    w.write("price").float64(1.08451)
     .write("qty").int64(1_000_000));

Internally: acquire the next position in the memory-mapped file, write the binary-encoded bytes directly into mapped memory via Unsafe.putLong(). No syscall. No serialisation intermediate buffer. The OS page cache absorbs the write; the data is immediately readable by any tailer on the same machine.

Write latency distribution (measured, Xeon E5-2687W, NVMe SSD):

p50:   45ns
p99:   180ns
p999:  2,100ns  ← occasional OS page fault or scheduler preemption
p9999: 18,000ns ← rare, usually GC or OS flush

The p999 and p9999 spike from the OS occasionally needing to fault in a new memory-mapped page or flush the page cache. These are manageable — pre-touching pages and configuring vm.dirty_* kernel parameters reduces them significantly.

Chronicle Queue is the right choice when:

Latency budget is sub-millisecond and the log is intra-process or intra-machine
You need a durable ordered log without a network hop
Single-machine reliability is acceptable (no broker replication)
The data volume fits on local disk

For us: trade event journaling, order state persistence, price history for the current trading day. If the machine died, we’d recover from the exchange’s records. The local journal was for operational use — replaying the day’s events to debug an issue, feeding the risk engine’s position reconstruction.

Kafka: The Distributed Pipeline

Kafka is a distributed log: write to a broker, the broker replicates to followers, consumers pull from any broker in the cluster. It’s designed for scenarios where the producer and consumer don’t share a machine, where you need multiple independent consumers, where data must survive broker failure, and where throughput is more important than nanosecond latency.

Write latency with acks=all (all replicas must acknowledge):

p50:   4ms
p99:   18ms
p999:  95ms

That’s 4ms at median, versus 45ns for Chronicle. For intra-day trade journaling with a sub-millisecond SLA, Kafka is not an option.

Kafka is the right choice when:

Data must survive the producing machine dying (replication)
Multiple teams/systems consume the same data independently
Producer and consumer are on different machines (network is unavoidable)
You need > 1 physical machine worth of storage
Latency tolerance is milliseconds, not microseconds

For us: end-of-day risk reporting (trades from all systems → Kafka → risk analytics), market data archival (prices → Kafka → HDFS), regulatory reporting pipeline.

The Pattern We Used

Intra-day (latency-sensitive):           End-of-day (throughput, durability):

Trade events → Chronicle Queue           Chronicle Queue → Kafka Producer → Kafka
                     │                                        (async bridge)
              Risk engine (tailer)
              Debug replay (tailer)
              Order tracking (tailer)

At market close, a bridge process read the day’s Chronicle Queue and published everything to Kafka. The bridge ran asynchronously and didn’t affect the intra-day latency path. Kafka retained the data for 30 days for regulatory purposes and fed the downstream analytics systems.

What Chronicle Queue Can’t Do

Multi-machine distribution: data lives on local disk. No broker, no replication. If the disk fails, data is gone.
Multiple independent consumer offsets: Chronicle tailers share the physical file and each maintain their own read position. This works, but it’s not the managed consumer-group model Kafka provides.
Topic-based routing: one Chronicle Queue is one ordered log. Kafka’s topic/partition model provides more flexible routing.
Long-term retention at scale: Chronicle files grow until you delete them. At 100M messages/day, you need active management. Kafka’s retention policies handle this automatically with log compaction and segment deletion.

What Kafka Can’t Do

Sub-millisecond write latency: network round-trips and replication are irreducible.
Zero-copy local read: Chronicle’s memory-mapped access is faster than anything that crosses a network.
Embedded deployment: Kafka requires a cluster; Chronicle is a library dependency.
Deterministic tail latency: Kafka’s p999 is dominated by network and replication jitter.

The tools are complementary. Using Kafka for everything sacrifices the latency properties that make the intra-day path work; using Chronicle for everything sacrifices the durability and distribution properties that make the end-of-day pipeline reliable. Running both, with a bridge, gives you the right tool for each context.

Design Philosophy#

Chronicle Queue: The Local Fast Lane#

Kafka: The Distributed Pipeline#

The Pattern We Used#

What Chronicle Queue Can’t Do#

What Kafka Can’t Do#