Bits, Trades & Systems

Project Loom Preview: Virtual Threads and What They Mean for Server Code

Java’s threading model has a fundamental scalability problem: OS threads are expensive. Creating thousands of them consumes gigabytes of stack memory and causes significant scheduling overhead. This is why reactive programming (Netty, Project Reactor, RxJava) became popular — it avoids the thread-per-request model by using event loops and async callbacks. Project Loom, announced in 2017 with early previews arriving in 2018, proposed a different solution: make threads cheap. Virtual threads — JVM-managed threads that are not 1:1 with OS threads — could make the thread-per-request model scalable again. ...

Two Years of Clojure in Production: Honest Retrospective

Two years. Long enough that the novelty is gone and what’s left is the actual experience of living with the decision. Here’s the retrospective I’d want to have read before starting. ...

Distributed Transactions Are a Lie (And What to Do Instead)

Every discussion of distributed systems eventually reaches the question: “can we just wrap this in a transaction?” The answer is technically yes and practically no. Understanding why — and what to do instead — is one of the more important shifts in distributed systems thinking. ...

From Java 8 to Java 11 in a Regulated Environment: What Actually Broke

Java 11 was the first long-term support release after Java 8. Oracle’s announcement that commercial Java 8 support would end pushed the bank’s architecture committee to approve a migration. In theory: update the JDK, update the build files, done. In practice: six months of discovery. This is a frank account of what broke. ...

Building MiFID II Trade Reporting Infrastructure: An Engineer's View

MiFID II went live on January 3, 2018. The preparation started in 2016. Two years for a set of regulatory requirements that, from the outside, looked straightforward: report each trade to a trade repository within 15 minutes of execution. From the inside, “report each trade” requires answering: which trades? From which systems? In what format? To which trade repository? What constitutes a trade for the purposes of reporting vs. booking vs. settlement? What do you do when the reporting service is unavailable? What happens when the trade repository rejects a report? This is the engineering story of building a system to answer those questions. ...

Stream Processing with Kafka Streams vs Flink: A Real Comparison

By mid-2017, the institution had two competing proposals on the table for the next generation of real-time analytics infrastructure: one team advocating Kafka Streams, another advocating Apache Flink. Both solve the same problem. Both use Kafka as input and output. Both provide stateful stream processing with windowing and exactly-once semantics. The evaluation took eight weeks. Here’s what we found. ...

Persistent Data Structures Are Not Just for Functional Purists

When I joined the bank’s risk team, Clojure was already in production for risk calculation. The code I inherited used Clojure’s persistent maps and vectors everywhere — not as a philosophical statement but because the team had found them practically useful in a specific way. The specific way: concurrent reads and occasional writes to a shared state snapshot, with no locks. ...

Reading GC Logs Like a Detective

GC logs are always-on, low-overhead diagnostic data that the JVM will produce for you. They tell you the timing, cause, duration, and effect of every collection — if you know how to read them. Most Java engineers can tell you what GC does. Far fewer can look at a GC log and immediately see why the p99 latency spiked at 14:37 last Tuesday. ...

Column Stores for Analytics: Why Row-Based Is Wrong for This Problem

The analytics team’s query: “Give me total notional, average spread, and fill rate for every instrument over the last 90 days, broken down by hour.” On our Postgres trade history table with ~2 billion rows: 4 hours, 23 minutes. After the columnar rewrite: 8 seconds. This post is about why, not how to install Parquet. ...

Spec-Driven Development in Clojure: Validating Financial Data at the Edge

Before clojure.spec, our FIX message parser had a test suite with 40 hand-written test cases. We’d been running it in production for 18 months without incident. After we added spec and ran the property-based tests overnight, it found 7 edge cases we hadn’t written tests for — including one where a negative zero value (-0.0) in a price field caused the downstream risk calculation to produce NaN, which propagated silently through the pipeline and ended up in the regulatory report as a blank field. That was the end of hand-written validation tests for external data. ...