When I started writing Go after years of Java, the error handling felt tedious. Every function returns an error. Every callsite checks if err != nil. There’s no try/catch, no exception hierarchy, no automatic stack traces. The verbosity was jarring.
A year into building services at the fintech startup, I’d changed my view. The verbosity is real and the boilerplate is real, but the explicitness surfaces things that exception-based languages hide. The question is how to handle errors well rather than just correctly.
The Basics (What Everybody Knows)
| |
This is Go’s fundamental error model. Errors are values. Functions that can fail return an error as their last return value. The caller checks it.
What’s less obvious is what to do in the // or handle branch, and how to structure error information so that logs and debugging are actually useful.
Wrapping Errors for Context
Raw return err loses context. By the time an error surfaces at the top of the call stack, you know what failed but not where in the call stack or what you were trying to do.
Go 1.13 added fmt.Errorf with %w for wrapping:
| |
Now when this error surfaces, the message reads:
loadUserPortfolio for user usr_abc123: connection refused: dial tcp 10.0.1.5:5432
Instead of just:
connection refused: dial tcp 10.0.1.5:5432
The convention I settled on: functionName [relevant identifiers]: %w. Each layer adds its context prefix. The error message builds up a trace of what was happening.
errors.Is() and errors.As() unwrap chains:
| |
Sentinel Errors for Expected Failures
Not all errors are unexpected. “Record not found” is a normal condition. “Rate limit exceeded” is expected under load. These deserve their own types so callers can react appropriately:
| |
Callers distinguish them:
| |
The discipline: define sentinel errors and custom types at the package level, expose them publicly, and document when they’re returned. Callers should be able to inspect errors without string matching.
The Problem with Error Propagation in Concurrent Code
The simple if err != nil; return err pattern breaks down when you’re running concurrent operations:
| |
errgroup (from golang.org/x/sync) is the idiomatic solution:
| |
errgroup.WithContext creates a context that’s cancelled when any goroutine returns a non-nil error. This implements fail-fast: if fetchOrders fails, the context passed to fetchTrades is cancelled, and fetchTrades should respect context cancellation and return early.
Handling Partial Failures
For operations over a slice — enrich 100 trades, report 50 records — you often want to continue past individual failures rather than aborting on the first one:
| |
This pattern — return all results including errors, aggregate at the caller — works well when partial success is meaningful. Don’t use it when an individual failure should abort the whole operation (use errgroup for that).
Logging vs. Returning Errors
A common mistake that produces duplicate error logs:
| |
The rule: either log or return, not both. Functions deep in the call stack return errors. Functions at the boundary (HTTP handler, Kafka consumer, cron job) log errors.
| |
Panic vs. Error
Use panic for programmer errors that should never happen in correct code — nil pointer dereferences, slice out-of-bounds, type assertion failures on types you control. Use error for expected failure conditions — network failures, invalid input, resource exhaustion.
The distinction in practice:
| |
Top-level HTTP handlers and Kafka consumers should recover from panics — a panic in one request handler shouldn’t crash the whole service. But panics should be logged with a stack trace and treated as bugs to fix, not normal error handling paths.
The Pattern That Held Up
After a year of building Go services at the startup:
1. Wrap errors at each layer with context (function name + relevant IDs)
2. Define and use sentinel errors / custom types for expected failure cases
3. Use errgroup for concurrent operations that must all succeed
4. Return partial results with errors for batch operations
5. Log at boundaries, return at interior layers
6. Recover panics at service entry points (handlers, consumers)
The verbosity of if err != nil stops feeling tedious when the error messages are actually useful — when a production alert says processOrder: enrichTrade: lookupCounterparty for cpty_XYZ: connection refused, you know exactly where to look.