Data | Bits, Trades & Systems

ClickHouse for Application Analytics: Fast Aggregations Without Spark

The requirement: an internal analytics dashboard showing trading activity metrics — volume, trade count, latency distributions, error rates — sliced by instrument, venue, time window, and a dozen other dimensions. Data volume: about 4 billion events per day, 90-day retention. Query pattern: ad-hoc OLAP — arbitrary group-bys, time ranges, filters. We evaluated TimescaleDB (Postgres extension), Apache Druid, ClickHouse, and “just use BigQuery.” We chose ClickHouse. After a year in production, I’d make the same choice. ...

Kafka at Startup Scale

The fintech startup adopted Kafka early — we were processing market events at rates that would have overwhelmed any request-response queue. Two years in, with a five-broker cluster handling 200k messages/sec at peak, the operational experience was significantly different from what I’d expected based on the documentation and conference talks. ...

Choosing a Time-Series Database in 2020

The fintech startup had three different time-series storage problems at the same time. After evaluating the options available in 2020, we ended up running two different systems. Here’s the decision framework and why the landscape fragmented the way it did. ...

Schema Evolution in Avro: The Hard Lessons from Production

Three months after deploying Kafka with Avro schemas and the Confluent Schema Registry, we had a production incident where a schema change caused a downstream consumer to silently produce incorrect output — wrong field values, no error thrown, no monitoring alert triggered. That incident rewired how the team thought about schema evolution. The tools don’t protect you from all the failure modes. Understanding the rules and building organisational processes around them is what does. ...

Stream Processing with Kafka Streams vs Flink: A Real Comparison

By mid-2017, the institution had two competing proposals on the table for the next generation of real-time analytics infrastructure: one team advocating Kafka Streams, another advocating Apache Flink. Both solve the same problem. Both use Kafka as input and output. Both provide stateful stream processing with windowing and exactly-once semantics. The evaluation took eight weeks. Here’s what we found. ...

Column Stores for Analytics: Why Row-Based Is Wrong for This Problem

The analytics team’s query: “Give me total notional, average spread, and fill rate for every instrument over the last 90 days, broken down by hour.” On our Postgres trade history table with ~2 billion rows: 4 hours, 23 minutes. After the columnar rewrite: 8 seconds. This post is about why, not how to install Parquet. ...

Kafka in Finance: What 'Exactly Once' Actually Costs You

Kafka 0.11 landed with exactly-once semantics and a lot of marketing. We were running trade event pipelines in a regulated environment and the promise was appealing: no duplicate trades in the downstream risk system, no idempotency logic sprinkled through consuming services. After three months with it in production, the honest summary is: EOS (exactly-once semantics) works as advertised within its scope, and that scope is narrower than it sounds. ...

Clojure Data Pipelines: Transducers in Production Risk Processing

The risk calculation pipeline processed end-of-day positions: take all the day’s trades, aggregate them to net positions, apply mark-to-market prices, and compute risk metrics. The input was ~800,000 trade records; the output was ~12,000 position records with P&L and Greeks. The initial implementation used standard Clojure sequence operations: 1 2 3 4 5 6 (->> trades (filter open-trade?) (map enrich-with-market-data) (group-by :currency-pair) (map-vals aggregate-position) (map-vals compute-risk-metrics)) Clean. Readable. And it created four intermediate collections of 800,000 elements each before producing the final output. That’s 3.2M intermediate objects for a 12K result. Transducers changed this. ...

KDB+/Q for Java Developers: Reading the Matrix

KDB+ is used in risk analytics, trade surveillance, and market data storage across most tier-1 financial institutions. If you work in finance long enough, you will encounter it. Nothing in your Java background prepares you for it. ...

Time-Series Data at a Bank: Why Relational Databases Break and What Comes Next

When I moved to the large financial institution, the team I joined managed the market data and trade data storage layer. The engineering problem was deceptively simple to state: store every price tick, every trade execution, and every risk calculation — billions of records per day — and answer analytical queries over them quickly. The existing system was PostgreSQL. It worked, technically. Queries that needed to run in seconds for trading decisions took minutes. Operational costs for storage were climbing. The database team was spending more time running VACUUM than building features. Understanding why required understanding what time-series data actually is and why it’s different. ...