The job spec said “trading systems.” I assumed it would be similar to what I’d been doing — latency-focused, technically aggressive, small team, fast decisions. I was wrong on most counts, and right for reasons I didn’t expect.
Scale Changes Everything (And Not How You Think)
In my previous role, “scale” meant throughput: messages per second, ticks per second. We scaled by making the hot path faster.
At the new firm, scale meant something different: the number of systems, the number of teams, the volume of historical data, and the regulatory surface area. There was a trading system in New York, one in London, one in Singapore, all supposedly doing the same thing, all subtly different because they’d evolved independently over a decade.
The latency problem still existed, but it was dwarfed by the coordination problem.
Technology: More Conservative, More Justified
At the trading firm we’d upgrade to a new Java version within months of release. We ran early Disruptor versions in production before the v1.0 paper was published. Moving fast on technology was a competitive advantage.
At the institution, Java version upgrades required a formal testing cycle, sign-off from the risk technology team, and often coordination with regulators on changes to regulated systems. Some teams were running Java 7 in 2016 where the upgrade case had been open for two years.
This sounds like bureaucracy-as-pathology. It’s partly that, but it’s also rational risk management. A JVM upgrade that introduces a subtle GC behaviour change in a system processing $10 billion daily has a very different risk/reward profile than the same upgrade at a smaller firm. The conservatism is calibrated to the blast radius.
Correctness Over Latency
The mental model shift that took longest: optimising for correctness first, latency second.
Previously, a 10µs improvement in tick-to-quote latency was worth engineering weeks. Here, a 10µs improvement in order routing was irrelevant — the latency budget was dominated by compliance checks, risk aggregation, and multi-system coordination, all of which ran in milliseconds. Sub-millisecond optimisation was noise.
What mattered: Was the trade booked correctly in all three booking systems? Did the risk position update atomically with the fill? Did the regulatory reporting entry get generated for every trade without exception?
These are harder problems than latency. Latency problems are usually local — you find the slow code and fix it. Correctness problems in distributed systems depend on the interactions between systems, on network partitions, on the ordering of events across systems that don’t share a clock.
The Data Problem
Previously, the data volume was months of tick data in a Chronicle Queue and a handful of relational tables for reference data. Manageable.
Now: years of trade history across multiple asset classes, position data that needed to be reconstructable to any point in time for regulatory purposes, risk data that fed into models updated hourly, market data subscriptions to dozens of venues.
This is where I first seriously encountered time-series databases, column stores, and the distinction between OLTP and OLAP. The pricing system ran on relational data. The risk analytics ran on KDB+. The trade reporting ran on a data warehouse (Sybase at that point, being slowly migrated to something less miserable).
Each technology existed for good reasons. Understanding why required understanding the access patterns — which was itself a week of archaeology for each system.
What Carried Over
The fundamentals of the previous four years weren’t wasted:
- JVM internals knowledge was immediately useful — the firm’s Java systems had often-ignored GC logs full of stories
- The habit of measuring rather than assuming was culturally alien to some teams and a superpower in those contexts
- Low-level data structure knowledge (cache lines, allocation rates) still mattered for the latency-sensitive components that did exist
The trading firm years had given me depth in one narrow dimension. The institution gave me breadth — and the context to understand why the depth mattered less than I’d assumed, and what the other dimensions looked like.