By mid-2017, the institution had two competing proposals on the table for the next generation of real-time analytics infrastructure: one team advocating Kafka Streams, another advocating Apache Flink. Both solve the same problem. Both use Kafka as input and output. Both provide stateful stream processing with windowing and exactly-once semantics.
The evaluation took eight weeks. Here’s what we found.
The Use Case
Real-time trade enrichment and aggregation:
- Consume raw trade events from Kafka (~15,000 trades/minute during market hours)
- Enrich each trade with counterparty and instrument reference data (lookup)
- Compute per-counterparty and per-instrument aggregates (rolling 1h window)
- Publish enriched trades and aggregates back to Kafka for downstream consumers
Latency requirement: end-to-end (trade event → enriched output) under 500ms at p99. State size: ~2GB of rolling window state.
Kafka Streams
Kafka Streams is a Java library — no cluster required. It runs inside your application JVM:
| |
The topology is declarative. The library handles partitioning, state stores (backed by RocksDB by default), fault tolerance via Kafka changelog topics, and scaling via consumer group rebalancing.
What we liked:
- No external cluster — just a library, runs in our existing service
- Exactly-once semantics available (transactions across input + output + state)
- Kafka-native — state changelogs in Kafka, replay via offset reset
- Joins with KTables are straightforward (join stream with reference data table)
What was painful:
- Windowing semantics are non-trivial to reason about (late data handling, grace periods)
- Interactive queries (querying the state store directly from an HTTP endpoint) required custom implementation
- The DSL has gaps — some operations that look natural don’t compose cleanly
Apache Flink
Flink is a distributed processing engine. It runs as a separate cluster (JobManager + TaskManagers) and you submit jobs to it:
| |
What we liked:
AsyncDataStreamfor external lookups — much cleaner than Kafka Streams’ table join approach for async enrichment- Savepoints: you can snapshot the entire pipeline state, deploy a new version, and resume from the savepoint — zero-downtime upgrades
- Event time processing with watermarks is well-designed and predictable
- Richer windowing options (session windows, global windows)
- Better resource isolation — the cluster is shared, each job has its own container
What was painful:
- Separate cluster to manage (operations burden)
- Flink’s checkpoint storage must be configured separately (S3, HDFS) for production
- Local development setup is more complex
- Serialisation story is more complex (TypeInformation, Kryo fallback)
The Performance Comparison
Both systems, same workload, same hardware (4-node cluster for Flink, 4-node application for Streams):
Metric Kafka Streams Apache Flink
─────────────────────────────────────────────────────────
Throughput (max) 45K trades/min 82K trades/min
End-to-end p50 120ms 95ms
End-to-end p99 380ms 210ms
End-to-end p99.9 1,200ms 440ms
State recovery time (1GB) 95s 28s
Rolling upgrade support No (requires restart) Yes (savepoints)
CPU utilisation (peak) 68% 71%
Memory (state) ~2.1GB/node ~1.8GB/node
Flink had higher throughput and better tail latency, largely because of better pipelining between stages and the async I/O API for enrichment lookups. Kafka Streams’ enrichment via KTable join requires the reference data to be co-partitioned and replicated into the local state store — a larger operational overhead for this use case.
The Decision
We chose Flink. The deciding factors:
Async enrichment: our reference data service was a separate system. Kafka Streams required materialising a full copy of reference data into a KTable (Kafka topic + local state store). Flink’s async I/O queried the service directly with up to 100 concurrent requests.
Savepoints: the regulatory reporting environment required zero-downtime deployments during market hours. Kafka Streams required consumer group rebalancing on each deployment; Flink’s savepoints allowed exact-state resumption.
Tail latency: our p99 requirement was 500ms. Kafka Streams at 380ms was fine; at p99.9 it wasn’t. Flink’s 210ms p99 gave comfortable headroom.
Where we’d choose Kafka Streams instead:
- Simpler topology with no external lookup dependencies
- Team unfamiliar with managing distributed systems
- All reference data already in Kafka topics (table join works cleanly)
- Prefer library over infrastructure (no cluster to manage)
Neither system is universally better. The choice should be driven by the topology, the operational model, and the team’s experience — not by headline throughput numbers.