By mid-2017, the institution had two competing proposals on the table for the next generation of real-time analytics infrastructure: one team advocating Kafka Streams, another advocating Apache Flink. Both solve the same problem. Both use Kafka as input and output. Both provide stateful stream processing with windowing and exactly-once semantics.

The evaluation took eight weeks. Here’s what we found.

The Use Case

Real-time trade enrichment and aggregation:

  1. Consume raw trade events from Kafka (~15,000 trades/minute during market hours)
  2. Enrich each trade with counterparty and instrument reference data (lookup)
  3. Compute per-counterparty and per-instrument aggregates (rolling 1h window)
  4. Publish enriched trades and aggregates back to Kafka for downstream consumers

Latency requirement: end-to-end (trade event → enriched output) under 500ms at p99. State size: ~2GB of rolling window state.

Kafka Streams

Kafka Streams is a Java library — no cluster required. It runs inside your application JVM:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
StreamsBuilder builder = new StreamsBuilder();

KStream<String, TradeEvent> trades = builder
    .stream("raw-trades", Consumed.with(Serdes.String(), tradeSerde));

// Enrich with reference data (lookup join to a KTable):
KTable<String, InstrumentData> instruments = builder
    .table("instrument-reference-data", Consumed.with(Serdes.String(), instrumentSerde));

KStream<String, EnrichedTrade> enriched = trades
    .join(instruments,
          (trade, instrument) -> enrich(trade, instrument),
          Joined.with(Serdes.String(), tradeSerde, instrumentSerde));

// Windowed aggregation per counterparty:
enriched
    .groupBy((key, trade) -> trade.counterpartyId())
    .windowedBy(TimeWindows.of(Duration.ofHours(1)))
    .aggregate(
        AggregateMetrics::empty,
        (key, trade, agg) -> agg.update(trade),
        Materialized.as("counterparty-aggregates")
    )
    .toStream()
    .to("counterparty-metrics");

KafkaStreams app = new KafkaStreams(builder.build(), config);
app.start();

The topology is declarative. The library handles partitioning, state stores (backed by RocksDB by default), fault tolerance via Kafka changelog topics, and scaling via consumer group rebalancing.

What we liked:

  • No external cluster — just a library, runs in our existing service
  • Exactly-once semantics available (transactions across input + output + state)
  • Kafka-native — state changelogs in Kafka, replay via offset reset
  • Joins with KTables are straightforward (join stream with reference data table)

What was painful:

  • Windowing semantics are non-trivial to reason about (late data handling, grace periods)
  • Interactive queries (querying the state store directly from an HTTP endpoint) required custom implementation
  • The DSL has gaps — some operations that look natural don’t compose cleanly

Flink is a distributed processing engine. It runs as a separate cluster (JobManager + TaskManagers) and you submit jobs to it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

DataStream<TradeEvent> trades = env.addSource(
    new FlinkKafkaConsumer<>("raw-trades", new TradeDeserializer(), kafkaProps));

// Enrichment via async I/O (lookup service):
DataStream<EnrichedTrade> enriched = AsyncDataStream.unorderedWait(
    trades,
    new ReferenceDataAsyncFunction(referenceDataClient),
    500, TimeUnit.MILLISECONDS,  // timeout
    100                          // max concurrent requests
);

// Windowed keyed aggregation:
enriched
    .keyBy(trade -> trade.counterpartyId())
    .window(TumblingEventTimeWindows.of(Time.hours(1)))
    .aggregate(new TradeAggregator())
    .addSink(new FlinkKafkaProducer<>("counterparty-metrics", ...));

env.execute("Trade Enrichment Pipeline");

What we liked:

  • AsyncDataStream for external lookups — much cleaner than Kafka Streams’ table join approach for async enrichment
  • Savepoints: you can snapshot the entire pipeline state, deploy a new version, and resume from the savepoint — zero-downtime upgrades
  • Event time processing with watermarks is well-designed and predictable
  • Richer windowing options (session windows, global windows)
  • Better resource isolation — the cluster is shared, each job has its own container

What was painful:

  • Separate cluster to manage (operations burden)
  • Flink’s checkpoint storage must be configured separately (S3, HDFS) for production
  • Local development setup is more complex
  • Serialisation story is more complex (TypeInformation, Kryo fallback)

The Performance Comparison

Both systems, same workload, same hardware (4-node cluster for Flink, 4-node application for Streams):

Metric                    Kafka Streams    Apache Flink
─────────────────────────────────────────────────────────
Throughput (max)          45K trades/min   82K trades/min
End-to-end p50             120ms           95ms
End-to-end p99             380ms           210ms
End-to-end p99.9          1,200ms          440ms
State recovery time (1GB)  95s             28s
Rolling upgrade support    No (requires restart)  Yes (savepoints)
CPU utilisation (peak)     68%             71%
Memory (state)             ~2.1GB/node     ~1.8GB/node

Flink had higher throughput and better tail latency, largely because of better pipelining between stages and the async I/O API for enrichment lookups. Kafka Streams’ enrichment via KTable join requires the reference data to be co-partitioned and replicated into the local state store — a larger operational overhead for this use case.

The Decision

We chose Flink. The deciding factors:

  1. Async enrichment: our reference data service was a separate system. Kafka Streams required materialising a full copy of reference data into a KTable (Kafka topic + local state store). Flink’s async I/O queried the service directly with up to 100 concurrent requests.

  2. Savepoints: the regulatory reporting environment required zero-downtime deployments during market hours. Kafka Streams required consumer group rebalancing on each deployment; Flink’s savepoints allowed exact-state resumption.

  3. Tail latency: our p99 requirement was 500ms. Kafka Streams at 380ms was fine; at p99.9 it wasn’t. Flink’s 210ms p99 gave comfortable headroom.

Where we’d choose Kafka Streams instead:

  • Simpler topology with no external lookup dependencies
  • Team unfamiliar with managing distributed systems
  • All reference data already in Kafka topics (table join works cleanly)
  • Prefer library over infrastructure (no cluster to manage)

Neither system is universally better. The choice should be driven by the topology, the operational model, and the team’s experience — not by headline throughput numbers.