Memory Layout in Go: Structs, Alignment, and Cache Performance

This is the JVM false-sharing problem in a different language. The rules differ slightly, the tooling differs, but the underlying hardware constraint — cache lines are 64 bytes and sharing one across goroutines is expensive — is identical. ...

August 17, 2022 · 5 min · MW

Busy Spinning vs Blocking: Thread Strategies for Ultra-Low Latency

When a thread is waiting for work — a new event, a lock to release, a signal — it has two options. It can block (tell the OS “wake me up when there’s work”) or busy-spin (loop checking a condition, never yielding the CPU). Both are correct. They have very different performance profiles. ...

May 14, 2014 · 5 min · MW

Mechanical Sympathy: Writing Java That Respects the Hardware

Martin Thompson coined the term “mechanical sympathy” — the idea that to write fast software you need to understand the machine it runs on. Not at the assembly level necessarily, but well enough to reason about what the CPU, memory hierarchy, and OS are actually doing with your code. This post is what that looks like in practice, writing Java for a system where microseconds matter. ...

December 4, 2012 · 4 min · MW
Available for consulting Distributed systems · Low-latency architecture · Go · LLM integration & RAG · Technical leadership
hello@turboawesome.win