RAG Systems in Production: What the Tutorials Don't Cover

RAG is architecturally simple: chunk documents, embed them, store in a vector DB, retrieve the top-k on query, pass retrieved context to an LLM, return answer. The demo takes an afternoon. The production system takes months, because “works on the demo documents” is nowhere near “answers correctly 95% of the time across the full document corpus.” This post is about the gap between those two states. ...

September 11, 2024 · 7 min · MW

Evaluating LLM Applications: Why 'It Looks Good' Is Not Enough

The first LLM feature I shipped was embarrassingly under-tested. I prompted the model, looked at a few outputs, thought “that looks right,” and deployed it. Users found failure modes within hours that I hadn’t imagined, much less tested for. This isn’t unusual. LLM applications have a testing problem that’s distinct from traditional software testing: the output space is too large to enumerate, the failure modes are semantic rather than syntactic, and “correctness” is often subjective. The standard response — “it’s hard, so test less” — produces unreliable products. Here’s what a functional evaluation framework looks like. ...

May 14, 2024 · 6 min · MW

Go's net/http: Building Production HTTP Servers Without a Framework

Go’s net/http is frequently underrated. The ecosystem has frameworks — Chi, Gin, Echo, Fiber — and they’re fine choices, but the standard library gets you remarkably far without additional dependencies. After building several production APIs that stayed on raw net/http, here’s the honest assessment of what you can and can’t do without a framework, and the patterns that make it work. ...

July 5, 2023 · 6 min · MW

Building the First Production Service at a Startup: Decisions Under Uncertainty

Three months into the startup, the prototype was working and investors were asking for a production timeline. We had a Postgres database, a Python script doing the core business logic, and no infrastructure to speak of. The decision: rewrite in Go, build proper infrastructure, or ship the Python and iterate? And if we rewrite, what does “proper infrastructure” mean when you have six engineers and four months of runway? ...

June 26, 2019 · 5 min · MW

Two Years of Clojure in Production: Honest Retrospective

Two years. Long enough that the novelty is gone and what’s left is the actual experience of living with the decision. Here’s the retrospective I’d want to have read before starting. ...

March 1, 2018 · 4 min · MW

From Zero to Production: My First Month in Electronic Trading

I joined the trading desk on a Monday. By Friday I had broken the USD/JPY price feed. Not catastrophically — it was a staging environment and the feed recovered in seconds — but the experience of watching a real-time market data stream go silent because of my code was unlike anything I’d encountered at university. It crystallised something immediately: in this environment, software failures have a price tag attached. ...

February 8, 2012 · 3 min · MW
Available for consulting Distributed systems · Low-latency architecture · Go · LLM integration & RAG · Technical leadership
hello@turboawesome.win