By early 2015 we had both Chronicle Queue and Kafka in production — Chronicle for intra-day trade journaling, Kafka for end-of-day data pipelines. The question came up repeatedly: why not use one for both? The answer is that they solve different problems with incompatible design priorities.

Design Philosophy

DimensionChronicle QueueApache Kafka
Primary goalUltra-low latency, single machineHigh-throughput, distributed, durable
DeploymentLibrary in your JVM processSeparate cluster (broker + ZooKeeper)
PersistenceMemory-mapped files, local diskReplicated log, multiple brokers
ConsumersUnlimited readers, zero coordinationConsumer groups, offset tracking
ReplicationNone (single machine)Configurable (min 1, default 3)
OrderingStrict global orderStrict per-partition order
Latency20–200ns write, <1µs read1–10ms end-to-end (acks=all)
Throughput~100M msgs/s (local memory)~1M msgs/s (network-bound)
Operational costZero (embedded)High (cluster management)

These are not competing products in the same category. They’re tools with overlapping capabilities and very different tradeoffs.

Chronicle Queue: The Local Fast Lane

Chronicle Queue is an embedded library — no network, no broker, no separate process. You write to it and read from it within the same machine (or across machines via shared NFS/NAS, though that’s uncommon).

The write path:

appender.writeDocument(w ->
    w.write("price").float64(1.08451)
     .write("qty").int64(1_000_000));

Internally: acquire the next position in the memory-mapped file, write the binary-encoded bytes directly into mapped memory via Unsafe.putLong(). No syscall. No serialisation intermediate buffer. The OS page cache absorbs the write; the data is immediately readable by any tailer on the same machine.

Write latency distribution (measured, Xeon E5-2687W, NVMe SSD):

p50:   45ns
p99:   180ns
p999:  2,100ns  ← occasional OS page fault or scheduler preemption
p9999: 18,000ns ← rare, usually GC or OS flush

The p999 and p9999 spike from the OS occasionally needing to fault in a new memory-mapped page or flush the page cache. These are manageable — pre-touching pages and configuring vm.dirty_* kernel parameters reduces them significantly.

Chronicle Queue is the right choice when:

  • Latency budget is sub-millisecond and the log is intra-process or intra-machine
  • You need a durable ordered log without a network hop
  • Single-machine reliability is acceptable (no broker replication)
  • The data volume fits on local disk

For us: trade event journaling, order state persistence, price history for the current trading day. If the machine died, we’d recover from the exchange’s records. The local journal was for operational use — replaying the day’s events to debug an issue, feeding the risk engine’s position reconstruction.

Kafka: The Distributed Pipeline

Kafka is a distributed log: write to a broker, the broker replicates to followers, consumers pull from any broker in the cluster. It’s designed for scenarios where the producer and consumer don’t share a machine, where you need multiple independent consumers, where data must survive broker failure, and where throughput is more important than nanosecond latency.

Write latency with acks=all (all replicas must acknowledge):

p50:   4ms
p99:   18ms
p999:  95ms

That’s 4ms at median, versus 45ns for Chronicle. For intra-day trade journaling with a sub-millisecond SLA, Kafka is not an option.

Kafka is the right choice when:

  • Data must survive the producing machine dying (replication)
  • Multiple teams/systems consume the same data independently
  • Producer and consumer are on different machines (network is unavoidable)
  • You need > 1 physical machine worth of storage
  • Latency tolerance is milliseconds, not microseconds

For us: end-of-day risk reporting (trades from all systems → Kafka → risk analytics), market data archival (prices → Kafka → HDFS), regulatory reporting pipeline.

The Pattern We Used

Intra-day (latency-sensitive):           End-of-day (throughput, durability):

Trade events → Chronicle Queue           Chronicle Queue → Kafka Producer → Kafka
                     │                                        (async bridge)
              Risk engine (tailer)
              Debug replay (tailer)
              Order tracking (tailer)

At market close, a bridge process read the day’s Chronicle Queue and published everything to Kafka. The bridge ran asynchronously and didn’t affect the intra-day latency path. Kafka retained the data for 30 days for regulatory purposes and fed the downstream analytics systems.

What Chronicle Queue Can’t Do

  • Multi-machine distribution: data lives on local disk. No broker, no replication. If the disk fails, data is gone.
  • Multiple independent consumer offsets: Chronicle tailers share the physical file and each maintain their own read position. This works, but it’s not the managed consumer-group model Kafka provides.
  • Topic-based routing: one Chronicle Queue is one ordered log. Kafka’s topic/partition model provides more flexible routing.
  • Long-term retention at scale: Chronicle files grow until you delete them. At 100M messages/day, you need active management. Kafka’s retention policies handle this automatically with log compaction and segment deletion.

What Kafka Can’t Do

  • Sub-millisecond write latency: network round-trips and replication are irreducible.
  • Zero-copy local read: Chronicle’s memory-mapped access is faster than anything that crosses a network.
  • Embedded deployment: Kafka requires a cluster; Chronicle is a library dependency.
  • Deterministic tail latency: Kafka’s p999 is dominated by network and replication jitter.

The tools are complementary. Using Kafka for everything sacrifices the latency properties that make the intra-day path work; using Chronicle for everything sacrifices the durability and distribution properties that make the end-of-day pipeline reliable. Running both, with a bridge, gives you the right tool for each context.