ClickHouse for Application Analytics: Fast Aggregations Without Spark

The requirement: an internal analytics dashboard showing trading activity metrics — volume, trade count, latency distributions, error rates — sliced by instrument, venue, time window, and a dozen other dimensions. Data volume: about 4 billion events per day, 90-day retention. Query pattern: ad-hoc OLAP — arbitrary group-bys, time ranges, filters.

We evaluated TimescaleDB (Postgres extension), Apache Druid, ClickHouse, and “just use BigQuery.” We chose ClickHouse. After a year in production, I’d make the same choice.

Why ClickHouse

Column-oriented storage. OLAP queries typically read a few columns across many rows. Row-oriented storage (Postgres, MySQL) reads entire rows even when you only need two columns. ClickHouse reads only the columns your query touches. For a table with 50 columns where your query touches 4, this is a ~12x reduction in I/O before any compression.

Aggressive compression. ClickHouse’s default LZ4 compression on columnar data is highly effective for financial time-series (repeated patterns, narrow value ranges). Our trade event data compressed to about 15% of raw size. Smaller data → more data fits in RAM → fewer disk reads.

Vectorised query execution. ClickHouse executes queries using SIMD instructions on batches of column data. For aggregation queries, this is significantly faster than row-by-row processing.

Operational simplicity. A single ClickHouse node handles more than most teams need. No cluster to manage, no separate query engine, no object storage to configure. We ran on a single 32-core server with 256GB RAM and 8TB NVMe for the first eight months. Druid required ZooKeeper, a broker, historical nodes, coordinator, overlord. ClickHouse required sudo apt install clickhouse-server.

Data Ingestion from Go

We ingest via the native ClickHouse Go client (clickhouse-go/v2) using batch inserts. The critical detail: ClickHouse is not designed for high-frequency single-row inserts. Inserts are expensive because they create a new data part on disk; ClickHouse merges parts asynchronously in the background, but too many small parts causes write amplification and background merge pressure.

Rule: batch to at least 1000 rows, ideally 10,000–100,000 rows per insert.

We buffer events in memory (using a channel with a goroutine draining it), flush either when the buffer hits 50,000 rows or every 5 seconds. Under normal load we hit the row limit; under low load we flush on timer.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
func (w *Writer) flush(ctx context.Context, batch []Event) error {
    b, err := w.conn.PrepareBatch(ctx, "INSERT INTO events")
    if err != nil {
        return err
    }
    for _, e := range batch {
        if err := b.AppendStruct(&e); err != nil {
            return err
        }
    }
    return b.Send()
}

AppendStruct uses reflection to map Go struct fields to table columns. For maximum performance at very high ingestion rates, use the lower-level Append calls directly — but AppendStruct is fast enough for our 4B events/day.

Schema Design for Analytics Queries

The most important ClickHouse decision: the ORDER BY clause in the table definition. This determines:

How data is sorted on disk (affects compression)
Which queries can use primary key indexes efficiently
Which filters avoid full scans

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
CREATE TABLE trade_events (
    event_time     DateTime64(3),
    instrument_id  LowCardinality(String),
    venue          LowCardinality(String),
    side           Enum8('BUY' = 1, 'SELL' = 2),
    quantity       UInt64,
    price          Float64,
    latency_us     UInt32,
    status         LowCardinality(String)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_time)
ORDER BY (instrument_id, venue, event_time);

LowCardinality(String) for low-cardinality string columns (instrument IDs, venues, statuses) enables dictionary encoding — dramatically better compression and faster GROUP BY than plain String.

ORDER BY (instrument_id, venue, event_time) means queries filtered by instrument + venue + time range use the primary index and avoid scanning partitions outside the filter. Queries that only filter by time range without instrument/venue do a full partition scan — acceptable in our case because time range is almost always combined with at least one other dimension.

Query Performance in Practice

Representative queries on 90 days of data (~360 billion rows):

Query	Rows scanned	Time
Volume by instrument, last hour	~50M	0.3s
Latency p99 by venue, last 7 days	~2B	1.2s
Error rate by instrument, 30 days	~8B	4.1s
Full 90-day aggregate, no filter	~360B	28s

The first three are dashboard queries run interactively. Sub-5-second responses at multi-billion-row scale, on a single server, without materialised views, without pre-aggregation. The full-scan query takes 28 seconds but is only run for bulk exports, not interactively.

What ClickHouse Is Not

Not a transactional database. Updates and deletes exist (ALTER TABLE ... UPDATE/DELETE) but are expensive (they rewrite data parts). Design your schema to be append-only; use time-based partitions as the deletion mechanism (drop old partitions, not rows).

Not designed for high-concurrency point lookups. If you’re doing primary-key lookups of individual rows, Postgres is better. ClickHouse’s strength is sequential scans of large column ranges.

Not operationally free. It’s simpler than Druid, but it has its own failure modes — merge pressure from too-small inserts, memory spikes on complex aggregations, query queue management under high concurrency. Worth understanding before running at scale.

The Verdict After a Year

ClickHouse delivered on its promises. Sub-second aggregation queries over billions of rows, simple operations, reasonable resource consumption. The main operational learning was insert batch sizing — we initially batched at 1000 rows (too small) and saw merge pressure that caused write latency spikes. Moving to 50,000-row batches fixed it.

For financial analytics workloads — time-series event data, OLAP query patterns, multi-billion-row scales, small team without a data engineering function — it’s the right tool.

Why ClickHouse#

Data Ingestion from Go#

Schema Design for Analytics Queries#

Query Performance in Practice#

What ClickHouse Is Not#

The Verdict After a Year#