gRPC in Production: Lessons After Two Years

We moved our internal service communication from REST+JSON to gRPC when the data pipeline scaled past a point where JSON parsing overhead was measurable in profiles. Two years later, the performance win was real and smaller than expected; the developer experience wins were larger; and the operational complexity was a genuine tax that we underpriced initially.

Why gRPC Over REST for Internal Services

The argument we made at the time:

Protocol buffers are compact and fast to serialise. Measured in our pipeline: ~3x smaller payloads than equivalent JSON, serialisation ~5x faster. At 50k messages/second, this was real CPU and bandwidth.
Strongly typed contracts. The .proto files are the contract; both client and server generate code from them. Schema drift that would silently break JSON consumers becomes a compile error.
Streaming. gRPC has native bidirectional streaming, which maps cleanly onto our market data distribution pattern. Achieving the equivalent over REST required SSE or WebSockets with their own complexity.
Load balancing with connection reuse. gRPC over HTTP/2 multiplexes multiple requests over a single connection, with efficient head-of-line elimination.

All of these held up. The performance gains were real, the schema contract was valuable, and streaming worked well.

What We Underestimated

Proto schema evolution is strict. You can add fields safely. You can never remove or reuse field numbers. In practice, this means your .proto files accumulate deprecated fields over time. After two years, some messages had four or five reserved field numbers and the corresponding code had to handle both old and new field names during transition periods. Manageable, but more discipline than JSON schema evolution requires.

Debugging is harder. With REST+JSON, you can curl an endpoint, paste the response into a JSON formatter, and understand what you’re looking at. gRPC is binary; you need either grpcurl, a proper gRPC proxy (like Envoy with transcoding), or client-side logging. We added a gRPC server reflection endpoint and configured our internal tooling to speak to it, but it was an investment.

Load balancing is a first-class concern, not automatic. gRPC over HTTP/2 opens long-lived connections. A naive L4 load balancer (like a stock AWS NLB) will concentrate all traffic on one server once the connection is established. You need either client-side load balancing (we used grpc.WithDefaultServiceConfig with round-robin), an L7 proxy (Envoy or Nginx), or gRPC’s own name resolver mechanism. We learned this when staging looked balanced and production did not.

Error handling semantics are different. gRPC status codes are not HTTP status codes. NOT_FOUND in gRPC is a legitimate domain state; the same code from an L7 proxy means the upstream is unreachable. Middleware that maps gRPC errors to client-visible responses needs to distinguish between application errors and infrastructure errors. We got this wrong initially and had opaque error messages in clients for two months.

The Streaming Patterns That Actually Work

Server streaming (server pushes a stream of responses to one client request) worked well for market data distribution. One connection, continuous updates, and gRPC’s flow control prevented fast producers from overwhelming slow consumers.

Bidirectional streaming was harder. We used it for an order management protocol (client sends orders, server sends fills and status updates on the same stream). The complexity: both sides are simultaneously reader and writer, flow control applies independently in each direction, and half-close semantics require care. We found that for most of our use cases, unidirectional server streaming + a separate command channel was easier to reason about and maintain than full bidirectional streaming.

Client-Side Interceptors Are the Right Pattern for Cross-Cutting Concerns

gRPC’s interceptor model (both unary and streaming) is the correct place for:

Tracing (inject/extract trace context from metadata)
Authentication (inject tokens, validate on server)
Retry logic with backoff
Metrics (latency histograms per method)

We built these once as a shared interceptor package. New services pick it up and get observability, auth, and retry for free. This compounds well — it’s the kind of investment that pays more as the service count grows.

1
2
3
4
5
6
7
conn, err := grpc.Dial(target,
    grpc.WithChainUnaryInterceptor(
        otelgrpc.UnaryClientInterceptor(),
        authInterceptor(tokenSource),
        retryInterceptor(retryOpts),
    ),
)

The Net Assessment

gRPC was the right choice for this use case — internal, high-throughput, latency-sensitive service communication with stable contracts. If we were building an external API consumed by third parties with diverse tooling, REST+JSON would have remained the better choice for accessibility reasons.

The lessons: invest early in the debugging toolchain (grpcurl, reflection, Envoy transcoding if external access is needed), take proto schema governance seriously from day one, and treat load balancing as an explicit design concern rather than assuming it works.

Why gRPC Over REST for Internal Services#

What We Underestimated#

The Streaming Patterns That Actually Work#

Client-Side Interceptors Are the Right Pattern for Cross-Cutting Concerns#

The Net Assessment#

Why gRPC Over REST for Internal Services

What We Underestimated

The Streaming Patterns That Actually Work

Client-Side Interceptors Are the Right Pattern for Cross-Cutting Concerns

The Net Assessment