Kafka at Startup Scale

The fintech startup adopted Kafka early — we were processing market events at rates that would have overwhelmed any request-response queue. Two years in, with a five-broker cluster handling 200k messages/sec at peak, the operational experience was significantly different from what I’d expected based on the documentation and conference talks. ...

May 18, 2022 · 6 min · MW

Schema Evolution in Avro: The Hard Lessons from Production

Three months after deploying Kafka with Avro schemas and the Confluent Schema Registry, we had a production incident where a schema change caused a downstream consumer to silently produce incorrect output — wrong field values, no error thrown, no monitoring alert triggered. That incident rewired how the team thought about schema evolution. The tools don’t protect you from all the failure modes. Understanding the rules and building organisational processes around them is what does. ...

October 4, 2018 · 5 min · MW

Backpressure in Practice: Keeping Fast Producers from Killing Slow Consumers

The system that prompted this post was a trade enrichment pipeline. The input was a Kafka topic receiving ~50,000 trade events per minute during market hours. The enrichment step required a database lookup — pulling counterparty and instrument data — that averaged 2ms per trade. 50,000 trades/minute = ~833 trades/second. At 2ms per lookup, a single thread can handle 500 lookups/second. To keep up, we needed at least two threads and ideally a small pool. We had six threads and a queue in front of them. During a market event that pushed the rate to 200,000 trades/minute, the queue grew without bound, memory climbed, and the service eventually OOM’d. Classic backpressure failure. ...

June 14, 2018 · 6 min · MW

Building MiFID II Trade Reporting Infrastructure: An Engineer's View

MiFID II went live on January 3, 2018. The preparation started in 2016. Two years for a set of regulatory requirements that, from the outside, looked straightforward: report each trade to a trade repository within 15 minutes of execution. From the inside, “report each trade” requires answering: which trades? From which systems? In what format? To which trade repository? What constitutes a trade for the purposes of reporting vs. booking vs. settlement? What do you do when the reporting service is unavailable? What happens when the trade repository rejects a report? This is the engineering story of building a system to answer those questions. ...

October 3, 2017 · 6 min · MW

Stream Processing with Kafka Streams vs Flink: A Real Comparison

By mid-2017, the institution had two competing proposals on the table for the next generation of real-time analytics infrastructure: one team advocating Kafka Streams, another advocating Apache Flink. Both solve the same problem. Both use Kafka as input and output. Both provide stateful stream processing with windowing and exactly-once semantics. The evaluation took eight weeks. Here’s what we found. ...

September 27, 2017 · 4 min · MW

Kafka in Finance: What 'Exactly Once' Actually Costs You

Kafka 0.11 landed with exactly-once semantics and a lot of marketing. We were running trade event pipelines in a regulated environment and the promise was appealing: no duplicate trades in the downstream risk system, no idempotency logic sprinkled through consuming services. After three months with it in production, the honest summary is: EOS (exactly-once semantics) works as advertised within its scope, and that scope is narrower than it sounds. ...

January 10, 2017 · 4 min · MW

Chronicle Queue vs Kafka: Choosing a Persistent Journal at Nanosecond Scale

By early 2015 we had both Chronicle Queue and Kafka in production — Chronicle for intra-day trade journaling, Kafka for end-of-day data pipelines. The question came up repeatedly: why not use one for both? The answer is that they solve different problems with incompatible design priorities. ...

January 21, 2015 · 4 min · MW
Available for consulting Distributed systems · Low-latency architecture · Go · LLM integration & RAG · Technical leadership
hello@turboawesome.win