A data race is a program that reads and writes shared memory concurrently without synchronisation. The behaviour is undefined: you might get the old value, the new value, a torn read (part old, part new), or a crash. Reproducing the bug is usually impossible because it depends on precise CPU scheduling.
Go’s race detector is a compile-time instrumentation tool that detects these at runtime. It’s one of the most useful debugging tools in the Go ecosystem and one of the most underused.
How the Race Detector Works
The race detector is based on the ThreadSanitizer (TSan) algorithm from Google. It instruments every memory access to track which goroutine last accessed each memory location and whether synchronisation happened between accesses.
| |
When a race is detected, the program prints a detailed report and exits:
==================
WARNING: DATA RACE
Read at 0x00c0001e2020 by goroutine 7:
github.com/company/cache.(*LRU).Get()
/app/cache/lru.go:45 +0x84
Previous write at 0x00c0001e2020 by goroutine 6:
github.com/company/cache.(*LRU).Set()
/app/cache/lru.go:78 +0x124
Goroutine 7 (running) created at:
github.com/company/handler.HandleRequest()
/app/handler/handler.go:92 +0x1a4
Goroutine 6 (running) created at:
github.com/company/cache.(*LRU).evictLoop()
/app/cache/lru.go:31 +0x64
==================
This tells you exactly: the race is between a Get call and a background Set call in the LRU cache. Goroutine 7 read the location that goroutine 6 wrote without any synchronisation between them.
The Cost
The race detector adds ~5–20× runtime overhead and doubles memory usage. This makes it unsuitable for production performance-sensitive code, but it’s fine for:
- Unit tests and integration tests
- Load tests with moderate volume
- Development environments
The performance overhead is why it’s a CI tool, not a production tool.
Running It in CI
| |
The -count=1 flag disables test caching — necessary because races are non-deterministic and a cached result might hide a race that sometimes appears.
For services with multiple packages, run all of them:
| |
For flakey races (races that only appear under specific goroutine scheduling), increase the number of runs:
| |
Common Race Patterns in Go
Pattern 1: Map concurrent read/write
| |
Built-in maps in Go are not safe for concurrent use. This is the #1 race I see in production code.
Pattern 2: Shared variable in goroutine closure
| |
(Note: Go 1.22 fixed loop variable semantics — each iteration gets its own variable. But code targeting older versions still needs the explicit shadow.)
Pattern 3: Struct fields accessed without synchronisation
| |
Pattern 4: sync.WaitGroup misuse
| |
What the Race Detector Doesn’t Find
Deadlocks: the race detector detects concurrent unsynchronised access. It doesn’t detect two goroutines waiting on each other. For deadlocks, the goroutine profile (/debug/pprof/goroutine) shows goroutines stuck in blocking states.
Races that never occur during your tests: the race detector only reports races that actually happen during a run. If your tests don’t exercise the concurrent code path that has the race, it won’t be found.
Logic errors that are technically synchronised but still wrong: a mutex-protected counter that’s incremented twice where once was intended is not a race — it’s a logic error the race detector won’t catch.
Making Tests Trigger Concurrent Paths
The race detector only fires when the racy code actually executes concurrently. Tests need to exercise concurrent code paths:
| |
Tests like this are specifically designed to create conditions where races would manifest. They’re worth writing for every data structure or service that’s accessed concurrently.
The Return on Investment
Running tests with -race costs extra CI time — roughly 10× longer. For a test suite that takes 2 minutes, that’s 20 minutes. For most services, this is worth it:
- One data race in production causes hours of investigation
- Data races tend to manifest as intermittent, hard-to-reproduce failures
- The race detector pinpoints the exact lines of code involved
The services where the cost is genuinely too high (a 30-minute test suite becomes 5 hours) deserve a different strategy: run the race detector on the core packages (data structures, caches, shared state) and skip it for end-to-end tests.
Whatever the strategy: run it somewhere, consistently, before merging to main. The race you find in CI is the race you don’t debug in production at 2am.