Tail-Based Trace Sampling: Why Head Sampling Is Usually Wrong

The large US technology company runs at a scale where tracing every request is financially and computationally impractical. You have to sample. How you sample determines whether your traces are useful. Most teams implement head-based sampling — decide whether to trace a request when it starts. This is the easy implementation and produces useless traces for most debugging purposes. ...

October 9, 2024 · 5 min · MW

Observability at Scale: What 'Good' Looks Like When You Have Too Much Data

At a startup with a dozen services, the observability problem is getting enough signal. You don’t have enough logging, your traces are incomplete, and your metrics dashboards have gaps. You know when something is wrong because a user tells you. At scale, the problem inverts. You have petabytes of logs, hundreds of millions of traces per day, and metrics cardinality so high that naive approaches cause your time-series database to OOM. The engineering challenge is filtering signal from noise, not generating signal. Both problems are real. They require different solutions. ...

May 29, 2024 · 5 min · MW

OpenTelemetry in Go: Distributed Tracing That Doesn't Get in the Way

Before OpenTelemetry, distributed tracing was a vendor-specific integration. You chose Jaeger or Zipkin or Datadog, added their SDK, and traced against their API. Switching vendors meant rewriting instrumentation. Adding a library that used a different tracing SDK meant two tracing systems running in parallel. OpenTelemetry solved this with a vendor-neutral API and a pluggable exporter model. The API stays the same; you swap exporters. Most major observability vendors now accept OTel format natively. Here’s what a solid Go integration looks like. ...

May 31, 2022 · 5 min · MW

Context Propagation in Go: Deadlines, Cancellation, and Tracing

context.Context appears in the signature of almost every non-trivial Go function. It’s the idiomatic way to carry deadlines, cancellation signals, and request-scoped values across API boundaries and goroutines. After years of reading Go codebases, the misuse patterns are as common as the correct uses. ...

February 19, 2020 · 5 min · MW
Available for consulting Distributed systems · Low-latency architecture · Go · LLM integration & RAG · Technical leadership
hello@turboawesome.win