The requirement: an internal analytics dashboard showing trading activity metrics — volume, trade count, latency distributions, error rates — sliced by instrument, venue, time window, and a dozen other dimensions. Data volume: about 4 billion events per day, 90-day retention. Query pattern: ad-hoc OLAP — arbitrary group-bys, time ranges, filters.
We evaluated TimescaleDB (Postgres extension), Apache Druid, ClickHouse, and “just use BigQuery.” We chose ClickHouse. After a year in production, I’d make the same choice.
Why ClickHouse
Column-oriented storage. OLAP queries typically read a few columns across many rows. Row-oriented storage (Postgres, MySQL) reads entire rows even when you only need two columns. ClickHouse reads only the columns your query touches. For a table with 50 columns where your query touches 4, this is a ~12x reduction in I/O before any compression.
Aggressive compression. ClickHouse’s default LZ4 compression on columnar data is highly effective for financial time-series (repeated patterns, narrow value ranges). Our trade event data compressed to about 15% of raw size. Smaller data → more data fits in RAM → fewer disk reads.
Vectorised query execution. ClickHouse executes queries using SIMD instructions on batches of column data. For aggregation queries, this is significantly faster than row-by-row processing.
Operational simplicity. A single ClickHouse node handles more than most teams need. No cluster to manage, no separate query engine, no object storage to configure. We ran on a single 32-core server with 256GB RAM and 8TB NVMe for the first eight months. Druid required ZooKeeper, a broker, historical nodes, coordinator, overlord. ClickHouse required sudo apt install clickhouse-server.
Data Ingestion from Go
We ingest via the native ClickHouse Go client (clickhouse-go/v2) using batch inserts. The critical detail: ClickHouse is not designed for high-frequency single-row inserts. Inserts are expensive because they create a new data part on disk; ClickHouse merges parts asynchronously in the background, but too many small parts causes write amplification and background merge pressure.
Rule: batch to at least 1000 rows, ideally 10,000–100,000 rows per insert.
We buffer events in memory (using a channel with a goroutine draining it), flush either when the buffer hits 50,000 rows or every 5 seconds. Under normal load we hit the row limit; under low load we flush on timer.
| |
AppendStruct uses reflection to map Go struct fields to table columns. For maximum performance at very high ingestion rates, use the lower-level Append calls directly — but AppendStruct is fast enough for our 4B events/day.
Schema Design for Analytics Queries
The most important ClickHouse decision: the ORDER BY clause in the table definition. This determines:
- How data is sorted on disk (affects compression)
- Which queries can use primary key indexes efficiently
- Which filters avoid full scans
| |
LowCardinality(String) for low-cardinality string columns (instrument IDs, venues, statuses) enables dictionary encoding — dramatically better compression and faster GROUP BY than plain String.
ORDER BY (instrument_id, venue, event_time) means queries filtered by instrument + venue + time range use the primary index and avoid scanning partitions outside the filter. Queries that only filter by time range without instrument/venue do a full partition scan — acceptable in our case because time range is almost always combined with at least one other dimension.
Query Performance in Practice
Representative queries on 90 days of data (~360 billion rows):
| Query | Rows scanned | Time |
|---|---|---|
| Volume by instrument, last hour | ~50M | 0.3s |
| Latency p99 by venue, last 7 days | ~2B | 1.2s |
| Error rate by instrument, 30 days | ~8B | 4.1s |
| Full 90-day aggregate, no filter | ~360B | 28s |
The first three are dashboard queries run interactively. Sub-5-second responses at multi-billion-row scale, on a single server, without materialised views, without pre-aggregation. The full-scan query takes 28 seconds but is only run for bulk exports, not interactively.
What ClickHouse Is Not
Not a transactional database. Updates and deletes exist (ALTER TABLE ... UPDATE/DELETE) but are expensive (they rewrite data parts). Design your schema to be append-only; use time-based partitions as the deletion mechanism (drop old partitions, not rows).
Not designed for high-concurrency point lookups. If you’re doing primary-key lookups of individual rows, Postgres is better. ClickHouse’s strength is sequential scans of large column ranges.
Not operationally free. It’s simpler than Druid, but it has its own failure modes — merge pressure from too-small inserts, memory spikes on complex aggregations, query queue management under high concurrency. Worth understanding before running at scale.
The Verdict After a Year
ClickHouse delivered on its promises. Sub-second aggregation queries over billions of rows, simple operations, reasonable resource consumption. The main operational learning was insert batch sizing — we initially batched at 1000 rows (too small) and saw merge pressure that caused write latency spikes. Moving to 50,000-row batches fixed it.
For financial analytics workloads — time-series event data, OLAP query patterns, multi-billion-row scales, small team without a data engineering function — it’s the right tool.