Error Handling in Go: Patterns That Actually Work at Scale

When I started writing Go after years of Java, the error handling felt tedious. Every function returns an error. Every callsite checks if err != nil. There’s no try/catch, no exception hierarchy, no automatic stack traces. The verbosity was jarring.

A year into building services at the fintech startup, I’d changed my view. The verbosity is real and the boilerplate is real, but the explicitness surfaces things that exception-based languages hide. The question is how to handle errors well rather than just correctly.

The Basics (What Everybody Knows)

1
2
3
4
result, err := doSomething()
if err != nil {
    return nil, err  // or handle
}

This is Go’s fundamental error model. Errors are values. Functions that can fail return an error as their last return value. The caller checks it.

What’s less obvious is what to do in the // or handle branch, and how to structure error information so that logs and debugging are actually useful.

Wrapping Errors for Context

Raw return err loses context. By the time an error surfaces at the top of the call stack, you know what failed but not where in the call stack or what you were trying to do.

Go 1.13 added fmt.Errorf with %w for wrapping:

1
2
3
4
5
6
7
func loadUserPortfolio(ctx context.Context, userID string) (*Portfolio, error) {
    portfolio, err := db.QueryPortfolio(ctx, userID)
    if err != nil {
        return nil, fmt.Errorf("loadUserPortfolio for user %s: %w", userID, err)
    }
    return portfolio, nil
}

Now when this error surfaces, the message reads:

loadUserPortfolio for user usr_abc123: connection refused: dial tcp 10.0.1.5:5432

Instead of just:

connection refused: dial tcp 10.0.1.5:5432

The convention I settled on: functionName [relevant identifiers]: %w. Each layer adds its context prefix. The error message builds up a trace of what was happening.

errors.Is() and errors.As() unwrap chains:

1
2
3
4
5
var dbErr *pgconn.PgError
if errors.As(err, &dbErr) && dbErr.Code == "23505" {
    // Unique constraint violation — handle idempotently
    return existingRecord(ctx, id)
}

Sentinel Errors for Expected Failures

Not all errors are unexpected. “Record not found” is a normal condition. “Rate limit exceeded” is expected under load. These deserve their own types so callers can react appropriately:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// Sentinel error:
var ErrNotFound = errors.New("not found")

// Custom error type with structured data:
type ValidationError struct {
    Field   string
    Message string
}

func (e *ValidationError) Error() string {
    return fmt.Sprintf("validation failed: %s: %s", e.Field, e.Message)
}

Callers distinguish them:

1
2
3
4
5
6
7
order, err := orderService.Get(ctx, orderID)
if errors.Is(err, ErrNotFound) {
    return http.StatusNotFound, nil
}
if err != nil {
    return http.StatusInternalServerError, err
}

The discipline: define sentinel errors and custom types at the package level, expose them publicly, and document when they’re returned. Callers should be able to inspect errors without string matching.

The Problem with Error Propagation in Concurrent Code

The simple if err != nil; return err pattern breaks down when you’re running concurrent operations:

1
2
3
4
5
6
7
8
// This is wrong — races on 'err' and goroutine leak if one fails:
var result1 []Order
var result2 []Trade
var err error

go func() { result1, err = fetchOrders(ctx) }()
go func() { result2, err = fetchTrades(ctx) }()
// ... where's the WaitGroup? Where's the error handling?

errgroup (from golang.org/x/sync) is the idiomatic solution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
g, ctx := errgroup.WithContext(ctx)

var result1 []Order
var result2 []Trade

g.Go(func() error {
    var err error
    result1, err = fetchOrders(ctx)
    return err
})

g.Go(func() error {
    var err error
    result2, err = fetchTrades(ctx)
    return err
})

if err := g.Wait(); err != nil {
    return nil, fmt.Errorf("fetching portfolio data: %w", err)
}

errgroup.WithContext creates a context that’s cancelled when any goroutine returns a non-nil error. This implements fail-fast: if fetchOrders fails, the context passed to fetchTrades is cancelled, and fetchTrades should respect context cancellation and return early.

Handling Partial Failures

For operations over a slice — enrich 100 trades, report 50 records — you often want to continue past individual failures rather than aborting on the first one:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
type EnrichResult struct {
    Trade Trade
    Err   error
}

func enrichBatch(ctx context.Context, trades []Trade) []EnrichResult {
    results := make([]EnrichResult, len(trades))
    for i, trade := range trades {
        enriched, err := enrichTrade(ctx, trade)
        results[i] = EnrichResult{Trade: enriched, Err: err}
    }
    return results
}

// Caller aggregates:
results := enrichBatch(ctx, trades)
var succeeded []Trade
var failed []error

for _, r := range results {
    if r.Err != nil {
        failed = append(failed, r.Err)
    } else {
        succeeded = append(succeeded, r.Trade)
    }
}
log.Infof("Enriched %d/%d trades; %d failures", len(succeeded), len(trades), len(failed))

This pattern — return all results including errors, aggregate at the caller — works well when partial success is meaningful. Don’t use it when an individual failure should abort the whole operation (use errgroup for that).

Logging vs. Returning Errors

A common mistake that produces duplicate error logs:

1
2
3
4
5
6
7
8
// BAD: logs the error AND returns it — the caller logs it again
func processOrder(ctx context.Context, order Order) error {
    if err := validate(order); err != nil {
        log.Errorf("validation failed: %v", err)  // ← logged here
        return err                                  // ← caller logs again
    }
    return nil
}

The rule: either log or return, not both. Functions deep in the call stack return errors. Functions at the boundary (HTTP handler, Kafka consumer, cron job) log errors.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// Handler (boundary) — log and respond:
func (h *Handler) handleOrder(w http.ResponseWriter, r *http.Request) {
    order, err := h.service.ProcessOrder(r.Context(), req)
    if err != nil {
        log.WithError(err).Error("order processing failed")
        http.Error(w, "internal error", http.StatusInternalServerError)
        return
    }
    json.NewEncoder(w).Encode(order)
}

// Service (interior) — just return:
func (s *Service) ProcessOrder(ctx context.Context, req OrderRequest) (*Order, error) {
    if err := s.validate(req); err != nil {
        return nil, fmt.Errorf("ProcessOrder validate: %w", err)
    }
    // ...
}

Panic vs. Error

Use panic for programmer errors that should never happen in correct code — nil pointer dereferences, slice out-of-bounds, type assertion failures on types you control. Use error for expected failure conditions — network failures, invalid input, resource exhaustion.

The distinction in practice:

1
2
3
4
5
6
7
8
9
func (s *Server) handler(w http.ResponseWriter, r *http.Request) {
    defer func() {
        if rec := recover(); rec != nil {
            log.Errorf("panic in handler: %v\n%s", rec, debug.Stack())
            http.Error(w, "internal error", 500)
        }
    }()
    // ... handler code
}

Top-level HTTP handlers and Kafka consumers should recover from panics — a panic in one request handler shouldn’t crash the whole service. But panics should be logged with a stack trace and treated as bugs to fix, not normal error handling paths.

The Pattern That Held Up

After a year of building Go services at the startup:

1. Wrap errors at each layer with context (function name + relevant IDs)
2. Define and use sentinel errors / custom types for expected failure cases
3. Use errgroup for concurrent operations that must all succeed
4. Return partial results with errors for batch operations
5. Log at boundaries, return at interior layers
6. Recover panics at service entry points (handlers, consumers)

The verbosity of if err != nil stops feeling tedious when the error messages are actually useful — when a production alert says processOrder: enrichTrade: lookupCounterparty for cpty_XYZ: connection refused, you know exactly where to look.

The Basics (What Everybody Knows)#

Wrapping Errors for Context#

Sentinel Errors for Expected Failures#

The Problem with Error Propagation in Concurrent Code#

Handling Partial Failures#

Logging vs. Returning Errors#

Panic vs. Error#

The Pattern That Held Up#