The system that prompted this post was a trade enrichment pipeline. The input was a Kafka topic receiving ~50,000 trade events per minute during market hours. The enrichment step required a database lookup — pulling counterparty and instrument data — that averaged 2ms per trade.
50,000 trades/minute = ~833 trades/second. At 2ms per lookup, a single thread can handle 500 lookups/second. To keep up, we needed at least two threads and ideally a small pool. We had six threads and a queue in front of them.
During a market event that pushed the rate to 200,000 trades/minute, the queue grew without bound, memory climbed, and the service eventually OOM’d. Classic backpressure failure.
What Backpressure Is
Backpressure is signal flow in the opposite direction of data flow: the consumer signals the producer to slow down when the consumer can’t keep up.
Without backpressure:
Producer ──→[unbounded queue]──→ Consumer
↑
grows until OOM
With backpressure:
Producer ──→[bounded queue]──→ Consumer
↑ │
└──────── slow down ───────────┘
The mechanism varies by system, but the principle is the same: there must be a way for slowness to propagate upstream, so the source of data slows down rather than the queues between components growing without limit.
The Options
1. Blocking Bounded Queue (Simplest)
| |
BlockingQueue.put() blocks the producer thread when the queue is full. The producer can’t put another item until the consumer takes one. This is the simplest form of backpressure — the producer just waits.
Works well for: thread-to-thread communication within a service.
Breaks down when: the producer is a network receive thread that shouldn’t block (blocking it means the network buffer fills up, and the upstream sender eventually gets TCP backpressure — which might be fine, or might cause timeouts).
2. Kafka Consumer Group Lag
When consuming from Kafka, backpressure is implicit: if your consumer can’t keep up, consumer group lag grows. The consumer naturally reads at its own pace. The producer (upstream Kafka producer or upstream service) doesn’t slow down — the data is just buffered in Kafka.
This is often described as “backpressure via Kafka buffering,” but it’s not true backpressure — it’s buffering with lag accumulation. The distinction matters:
True backpressure: producer slows ← consumer signals
Kafka model: producer doesn't slow, lag grows in Kafka
Kafka works when you can tolerate growing lag (batch analytics) or when your lag is bounded by a known maximum (Kafka’s retention period). It doesn’t work when you need to process events within a deadline — a trade that’s 4 hours old when processed may miss a regulatory reporting window.
Monitoring consumer lag:
| |
Partition 1 has lag 860 — the consumer is behind. If this lag is growing, you need more consumer instances. If it’s stable, the consumer is keeping pace on average (even if bursty).
3. Reactive Streams (Project Reactor / RxJava)
Reactive Streams defines a protocol for backpressure between asynchronous stages. The consumer subscribes with a request(n) call indicating how many items it can handle. The producer sends at most n items, then waits for another request().
| |
The flatMap with concurrency limit creates natural backpressure: only 6 enrichments run in parallel. When all 6 are running, the flatMap stops requesting from the upstream flux. The flux stops emitting. The source slows down.
onBackpressureBuffer handles the case where the source can’t slow down — it buffers up to 10,000 events and drops oldest when the buffer is full. This is not true backpressure (the source isn’t slowed) but it bounds memory usage.
Works well for: services built entirely in reactive style.
Breaks down when: one stage in the pipeline is blocking — a blocking call anywhere in a reactive chain defeats the purpose and can starve the thread pool.
4. Rate Limiting at Source
Sometimes the right answer is to limit how fast the producer can produce, rather than propagating pressure upstream:
| |
This is appropriate when the producer is consuming from an external source (exchange feed, ARM submissions) where you can’t actually slow the source, but you can control the rate at which you process and forward events.
Guava’s RateLimiter uses a token bucket with smooth rate limiting — it handles bursts up to the bucket capacity, then enforces the rate limit. For harder limits (no burst tolerance), use a leaky bucket.
What We Did
The trade enrichment service was refactored to:
Bounded Kafka consumption: added a
MAX_POLL_RECORDS=50configuration — the consumer fetches at most 50 records per poll, preventing a single poll from flooding the processing queue.Fixed-size work queue: replaced the unbounded queue with a
LinkedBlockingQueue(capacity=5000). The consumer thread blocks when the queue is full. The Kafka consumer thread blocks onqueue.put(), which means it stops polling, which means Kafka sees no commits, which means aftermax.poll.interval.msthe consumer group rebalances — which is a failure mode.Separate thread for Kafka polling: to avoid the consumer group rebalance failure, we separated the Kafka poll loop from the queue put:
| |
- Monitoring: added metrics for queue depth, enrichment latency distribution, and drop rate. When queue depth exceeds 3000 (60% of capacity), we alert. When drop rate is non-zero, we page.
The result: during a 4× traffic spike, queue depth peaked at 4200, one consumer partition fell 90 seconds behind, and zero trades were dropped. Before: OOM in ~8 minutes.
The Backpressure Hierarchy
From weakest to strongest backpressure:
Logging and dropping → data loss, but service survives
Bounded buffer + drop oldest → data loss, bounded memory
Bounded buffer + block → no data loss, upstream propagation
Rate limiting at source → no data loss, predictable throughput
Full reactive streams → no data loss, propagates through pipeline
The right point on this spectrum depends on whether data loss is acceptable. For audit trails, regulatory reporting, and financial transactions: data loss is not acceptable, so you need blocking or reactive streams. For metrics, telemetry, and analytics: controlled dropping under extreme load is usually acceptable.
The failure mode to avoid at all costs is unbounded buffering — it’s how you turn a slow consumer into an OOM.