KDB+ is used in risk analytics, trade surveillance, and market data storage across most tier-1 financial institutions. If you work in finance long enough, you will encounter it. Nothing in your Java background prepares you for it.
What KDB+ Is
KDB+ is a column-oriented in-memory database with a built-in query language called Q. It was designed from the ground up for time-series data and is fast by a significant margin over conventional databases for that use case.
It’s not open source. It’s not cheap. It runs on a single thread (mostly). It has a syntax that looks like line noise to anyone coming from C-family languages. And it’s the tool of choice for intra-day analytics in finance because nothing else comes close on performance for the specific access patterns involved.
The Q Language: Mental Model Reset Required
Q is an array-oriented functional language. Every operator works on entire arrays, not individual elements. This is the fundamental shift:
/ Java mental model: loop over elements
for (int i = 0; i < prices.length; i++) {
result[i] = prices[i] * quantities[i];
}
/ Q mental model: operate on entire vectors
result: prices * quantities / element-wise multiply, no loop
A Q function that takes 1ms to process 1 element usually takes 1.1ms to process 1 million elements. The overhead is function call setup; the work scales with SIMD vectorisation, not loop iterations.
Basic types and their Java equivalents:
| Q type | Java equivalent | Notes |
|---|---|---|
int | int | 32-bit, null is 0Ni |
long | long | 64-bit, null is 0Nj |
float / real | double / float | |
symbol | interned String | Like an enum — stored as integer ID |
timestamp | Instant | Nanosecond precision, Q epoch = 2000-01-01 |
list | int[] (homogeneous) | |
table | column-oriented DataFrame | key difference from Java |
dictionary | Map<K,V> |
Tables Are Columnar Dictionaries
The Q table is the central abstraction. Unlike SQL tables (rows of records), Q tables are column-oriented by default:
/ A table is a dictionary from column names to column vectors
trades: ([]
time: 09:30:01 09:30:03 09:30:07 09:30:15;
sym: `EURUSD`GBPUSD`EURUSD`EURUSD;
side: `BUY`SELL`BUY`SELL;
price: 1.0842 1.2731 1.0843 1.0841;
qty: 1000000 500000 2000000 750000
)
This is stored as 5 parallel arrays, not as a list of row objects. Querying select sum qty from trades where sym=\EURUSDreads only thesymandqty` columns — the other 3 are never touched. Exactly the columnar access pattern that makes analytics fast.
Q-SQL looks familiar but executes differently from SQL:
/ Q-SQL: select with aggregation and grouping
select sum qty, avg price by sym, side from trades
/ With time bucketing (10-minute bars)
select sum qty, last price by sym, 10 xbar time.minute from trades
/ Joining to reference data (symbol table)
select sum qty, ref.fullName by sym from trades lj `sym xkey ref
The by clause is grouping. xbar is bucketing (round down to nearest multiple). lj is left join on a keyed table. The syntax is terse — Q is designed to fit complex queries on one line.
Time-Series Operations
Where Q genuinely has no peer: time-series joins and as-of lookups.
As-of join (aj): join each trade to the most recent quote before that trade’s timestamp:
/ quotes table: timestamp, sym, bid, offer
/ trades table: timestamp, sym, price, qty
/ For each trade, find the quote that was current at trade time
aj[`sym`time; trades; quotes]
/ Result: trades with bid and offer columns added from the most recent matching quote
In SQL this would be a correlated subquery or a LATERAL join — complex, slow, and often requiring a self-join. In Q it’s a primitive operation, implemented in C, operating on sorted column arrays. At billions of rows, it’s still fast.
TWAP / VWAP computation:
/ Volume-Weighted Average Price in 1-minute bars
vwap: select vwap: sum[price*qty]%sum qty by sym, 1 xbar time.minute from trades
The % is division in Q (not modulo — that’s mod). This query processes the entire trade history in one pass. No subqueries, no window functions syntax overhead.
The Performance Reality
Why KDB+ is fast:
- Column vectors in contiguous memory: cache-friendly sequential reads, SIMD-vectorisable
- Sorted data assumptions: time-series data is typically sorted by time; Q assumes this and uses binary search for time-range filters
- Interns symbols: string comparisons become integer comparisons
- Minimal overhead: the Q interpreter is written in C and K (a more primitive language), with no JVM, no GC, minimal allocation
Typical performance on modern hardware:
| Operation | Row count | Time |
|---|---|---|
| Sum of a long column | 1 billion | ~1.5s |
| VWAP by instrument, 1-min bars | 500M trades | ~3s |
| As-of join trades to quotes | 100M trades × 1B quotes | ~8s |
| Time-range filter + aggregate | 10M rows | ~50ms |
These numbers assume data fits in RAM (KDB+ is primarily an in-memory database, with a separate disk-based HDB for historical data). For our risk analytics use case, the intra-day data fit in 64GB RAM; the 5-year historical data lived on the HDB (Historical Database) and queries against it were significantly slower.
Connecting from Java
The Java KDB+ client (kx.c) connects via TCP and allows you to execute Q expressions remotely and receive results:
| |
The c.Flip is Q’s table in Java — arrays of column vectors. No ORM, no JDBC driver with row-at-a-time fetching. You get back the raw column arrays and work with them directly.
For high-frequency queries, keep the connection open and reuse it. Each new c(...) is a TCP connection with session setup overhead. Connection pools work but add complexity; for a single analytics process a long-lived connection is fine.
When Not to Use KDB+
KDB+ is specialised. It’s wrong for:
- Transactional workloads: no multi-statement transactions, optimistic locking, or row-level updates in the traditional sense
- Arbitrary write patterns: Q tables are effectively append-only in normal operation; in-place updates exist but are expensive
- Teams without Q expertise: the learning curve is steep and the talent pool is small
- Data that doesn’t fit the time-series model: it’s a specialised tool, not a general-purpose database
For everything that doesn’t fit — current positions, reference data, operational state — we used Postgres. KDB+ handled analytics; Postgres handled transactions. The right tool for the right access pattern, a theme that repeats through every data architecture I’ve worked on.