A data race is a program that reads and writes shared memory concurrently without synchronisation. The behaviour is undefined: you might get the old value, the new value, a torn read (part old, part new), or a crash. Reproducing the bug is usually impossible because it depends on precise CPU scheduling.

Go’s race detector is a compile-time instrumentation tool that detects these at runtime. It’s one of the most useful debugging tools in the Go ecosystem and one of the most underused.

How the Race Detector Works

The race detector is based on the ThreadSanitizer (TSan) algorithm from Google. It instruments every memory access to track which goroutine last accessed each memory location and whether synchronisation happened between accesses.

1
2
3
4
# Enable race detection:
go test -race ./...
go run -race main.go
go build -race -o service .

When a race is detected, the program prints a detailed report and exits:

==================
WARNING: DATA RACE
Read at 0x00c0001e2020 by goroutine 7:
  github.com/company/cache.(*LRU).Get()
      /app/cache/lru.go:45 +0x84

Previous write at 0x00c0001e2020 by goroutine 6:
  github.com/company/cache.(*LRU).Set()
      /app/cache/lru.go:78 +0x124

Goroutine 7 (running) created at:
  github.com/company/handler.HandleRequest()
      /app/handler/handler.go:92 +0x1a4

Goroutine 6 (running) created at:
  github.com/company/cache.(*LRU).evictLoop()
      /app/cache/lru.go:31 +0x64
==================

This tells you exactly: the race is between a Get call and a background Set call in the LRU cache. Goroutine 7 read the location that goroutine 6 wrote without any synchronisation between them.

The Cost

The race detector adds ~5–20× runtime overhead and doubles memory usage. This makes it unsuitable for production performance-sensitive code, but it’s fine for:

  • Unit tests and integration tests
  • Load tests with moderate volume
  • Development environments

The performance overhead is why it’s a CI tool, not a production tool.

Running It in CI

1
2
3
# GitHub Actions:
- name: Test with race detector
  run: go test -race -count=1 ./...

The -count=1 flag disables test caching — necessary because races are non-deterministic and a cached result might hide a race that sometimes appears.

For services with multiple packages, run all of them:

1
go test -race ./...

For flakey races (races that only appear under specific goroutine scheduling), increase the number of runs:

1
2
# Run tests 5 times each to increase probability of catching intermittent races:
go test -race -count=5 ./...

Common Race Patterns in Go

Pattern 1: Map concurrent read/write

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// RACY:
cache := make(map[string]int)
go func() { cache["key"] = 1 }()
go func() { _ = cache["key"] }()

// FIXED:
var mu sync.RWMutex
cache := make(map[string]int)
go func() {
    mu.Lock()
    cache["key"] = 1
    mu.Unlock()
}()
go func() {
    mu.RLock()
    _ = cache["key"]
    mu.RUnlock()
}()

Built-in maps in Go are not safe for concurrent use. This is the #1 race I see in production code.

Pattern 2: Shared variable in goroutine closure

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// RACY: all goroutines share the same `i` variable
for i := 0; i < 5; i++ {
    go func() {
        fmt.Println(i)  // captures i by reference, not by value
    }()
}

// FIXED: capture by value
for i := 0; i < 5; i++ {
    i := i  // shadow with new variable scoped to this iteration
    go func() {
        fmt.Println(i)
    }()
}

(Note: Go 1.22 fixed loop variable semantics — each iteration gets its own variable. But code targeting older versions still needs the explicit shadow.)

Pattern 3: Struct fields accessed without synchronisation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
type Stats struct {
    requests int64
    errors   int64
}

// RACY: multiple goroutines updating fields
stats.requests++

// FIXED with atomic:
import "sync/atomic"
atomic.AddInt64(&stats.requests, 1)

// Or with a mutex on the whole struct:
mu.Lock()
stats.requests++
mu.Unlock()

Pattern 4: sync.WaitGroup misuse

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
// RACY: Add called after Wait might have returned:
var wg sync.WaitGroup
for _, task := range tasks {
    go func(t Task) {
        wg.Add(1)   // ← called inside goroutine
        defer wg.Done()
        process(t)
    }(task)
}
wg.Wait()

// FIXED: Add before goroutine starts:
for _, task := range tasks {
    wg.Add(1)       // ← called in the spawning goroutine
    go func(t Task) {
        defer wg.Done()
        process(t)
    }(task)
}
wg.Wait()

What the Race Detector Doesn’t Find

Deadlocks: the race detector detects concurrent unsynchronised access. It doesn’t detect two goroutines waiting on each other. For deadlocks, the goroutine profile (/debug/pprof/goroutine) shows goroutines stuck in blocking states.

Races that never occur during your tests: the race detector only reports races that actually happen during a run. If your tests don’t exercise the concurrent code path that has the race, it won’t be found.

Logic errors that are technically synchronised but still wrong: a mutex-protected counter that’s incremented twice where once was intended is not a race — it’s a logic error the race detector won’t catch.

Making Tests Trigger Concurrent Paths

The race detector only fires when the racy code actually executes concurrently. Tests need to exercise concurrent code paths:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
func TestCacheConcurrent(t *testing.T) {
    cache := NewCache(100)

    // Run concurrent reads and writes to trigger any races:
    var wg sync.WaitGroup
    for i := 0; i < 100; i++ {
        wg.Add(2)
        go func(i int) {
            defer wg.Done()
            cache.Set(fmt.Sprintf("key-%d", i), i)
        }(i)
        go func(i int) {
            defer wg.Done()
            cache.Get(fmt.Sprintf("key-%d", i))
        }(i)
    }
    wg.Wait()
}

Tests like this are specifically designed to create conditions where races would manifest. They’re worth writing for every data structure or service that’s accessed concurrently.

The Return on Investment

Running tests with -race costs extra CI time — roughly 10× longer. For a test suite that takes 2 minutes, that’s 20 minutes. For most services, this is worth it:

  • One data race in production causes hours of investigation
  • Data races tend to manifest as intermittent, hard-to-reproduce failures
  • The race detector pinpoints the exact lines of code involved

The services where the cost is genuinely too high (a 30-minute test suite becomes 5 hours) deserve a different strategy: run the race detector on the core packages (data structures, caches, shared state) and skip it for end-to-end tests.

Whatever the strategy: run it somewhere, consistently, before merging to main. The race you find in CI is the race you don’t debug in production at 2am.