Posts

LLM Integration Patterns for Backend Engineers

LLM integration is a new category of external API call with some specific failure modes that don’t exist in traditional services. The call is expensive (100ms–5s), non-deterministic, and can fail softly — returning a plausible-looking wrong answer rather than an error code. Getting it right requires the same rigor you’d apply to any critical external dependency, plus some LLM-specific patterns. ...

Observability at Scale: What 'Good' Looks Like When You Have Too Much Data

At a startup with a dozen services, the observability problem is getting enough signal. You don’t have enough logging, your traces are incomplete, and your metrics dashboards have gaps. You know when something is wrong because a user tells you. At scale, the problem inverts. You have petabytes of logs, hundreds of millions of traces per day, and metrics cardinality so high that naive approaches cause your time-series database to OOM. The engineering challenge is filtering signal from noise, not generating signal. Both problems are real. They require different solutions. ...

Evaluating LLM Applications: Why 'It Looks Good' Is Not Enough

The first LLM feature I shipped was embarrassingly under-tested. I prompted the model, looked at a few outputs, thought “that looks right,” and deployed it. Users found failure modes within hours that I hadn’t imagined, much less tested for. This isn’t unusual. LLM applications have a testing problem that’s distinct from traditional software testing: the output space is too large to enumerate, the failure modes are semantic rather than syntactic, and “correctness” is often subjective. The standard response — “it’s hard, so test less” — produces unreliable products. Here’s what a functional evaluation framework looks like. ...

Cache Design as a Reliability Practice, Not an Optimisation

At the large US tech company, I inherited a service that had a cache. The cache was fast — it served 98% of requests with <1ms latency. The 2% cache misses hit the database, which took 50–200ms. Then the cache cluster had a rolling restart during a traffic spike. For three minutes, the cache hit rate dropped to 30%. The 70% misses all hit the database simultaneously. The database became saturated, latency spiked to 10s, and the service effectively went down — not because the cache was unavailable, but because the system wasn’t designed for cache misses at that rate. This is a cache reliability failure, not a cache performance failure. ...

Engineering at Enterprise Scale: What Changes When the System Is Actually Big

I’d worked at organisations ranging from twelve people to four hundred. The new role is at a company with tens of thousands of engineers. The systems are bigger, the coordination surface is larger, and some things I assumed were universal engineering truths turned out to be scale-specific. ...

Eleven Years In: A Retrospective on Careers, Choices, and Compounding Knowledge

I started writing code professionally in 2012. This year marks eleven years. The milestone prompts a kind of stock-taking that I find useful to do in writing. This is not a career advice post. It’s a personal retrospective on what happened, what I learned, and what I’d change — useful mostly as a data point rather than a prescription. ...

Go's Race Detector in CI: Catching Data Races Before They Catch You

A data race is a program that reads and writes shared memory concurrently without synchronisation. The behaviour is undefined: you might get the old value, the new value, a torn read (part old, part new), or a crash. Reproducing the bug is usually impossible because it depends on precise CPU scheduling. Go’s race detector is a compile-time instrumentation tool that detects these at runtime. It’s one of the most useful debugging tools in the Go ecosystem and one of the most underused. ...

Navigating Org Change as an Engineer

In the years at the European financial technology firm, the engineering organisation went through three significant restructuring events: a change in engineering leadership, a shift from functional to product-aligned teams, and a consolidation of two separate engineering groups. Each one produced the same kind of disruption and the same kind of opportunities. Org change is uncomfortable. It’s also underanalysed. Most engineers treat it as something happening to them, not something to navigate actively. ...

Go's net/http: Building Production HTTP Servers Without a Framework

Go’s net/http is frequently underrated. The ecosystem has frameworks — Chi, Gin, Echo, Fiber — and they’re fine choices, but the standard library gets you remarkably far without additional dependencies. After building several production APIs that stayed on raw net/http, here’s the honest assessment of what you can and can’t do without a framework, and the patterns that make it work. ...

ClickHouse for Application Analytics: Fast Aggregations Without Spark

The requirement: an internal analytics dashboard showing trading activity metrics — volume, trade count, latency distributions, error rates — sliced by instrument, venue, time window, and a dozen other dimensions. Data volume: about 4 billion events per day, 90-day retention. Query pattern: ad-hoc OLAP — arbitrary group-bys, time ranges, filters. We evaluated TimescaleDB (Postgres extension), Apache Druid, ClickHouse, and “just use BigQuery.” We chose ClickHouse. After a year in production, I’d make the same choice. ...