Persistent Data Structures Are Not Just for Functional Purists

When I joined the bank’s risk team, Clojure was already in production for risk calculation. The code I inherited used Clojure’s persistent maps and vectors everywhere — not as a philosophical statement but because the team had found them practically useful in a specific way.

The specific way: concurrent reads and occasional writes to a shared state snapshot, with no locks.

What “Persistent” Actually Means

“Persistent” in data structures terminology doesn’t mean “saved to disk.” It means all previous versions are preserved when a new version is created.

1
2
3
4
5
6
(def m1 {:a 1 :b 2 :c 3})
(def m2 (assoc m1 :d 4))

;; m1 still exists and is unchanged:
m1  ;=> {:a 1, :b 2, :c 3}
m2  ;=> {:a 1, :b 2, :c 3, :d 4}

assoc doesn’t modify m1. It creates m2, which shares structure with m1 (the unchanged parts) and adds the new entry. This is structural sharing — not a full copy.

For a map with 1 million entries, (assoc m1 :d 4) doesn’t copy 1 million entries. It creates a new root node pointing to mostly the same subtree, with one new leaf. The operation is O(log n).

The Performance Reality

Persistent data structures are slower per operation than mutable equivalents. A Clojure persistent map lookup is slower than a Java HashMap lookup because of path traversal through the tree:

Java HashMap.get():         O(1) amortised, ~15ns
Clojure persistent-map get: O(log₃₂ n) ≈ O(1) practical, ~40ns

Clojure’s implementation (Hash Array Mapped Trie) has a branching factor of 32, so even a map with 1 billion entries has depth ≤ 7. The O(log n) is so flat it behaves like O(1) in practice.

The cost of immutability: every “modification” creates a new version. For a hot path creating millions of updates per second, this allocation pressure matters. For the risk system — where snapshots update tens to hundreds of times per second — it doesn’t.

The Concurrency Model

The risk system maintained a global state: current positions, risk limits, and market data, all combined into a single map. Multiple readers accessed this concurrently; a single writer updated it when positions changed.

With mutable state and locks:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// Every reader must acquire lock:
ReadWriteLock lock = new ReentrantReadWriteLock();

RiskState getState() {
    lock.readLock().lock();
    try { return state; } finally { lock.readLock().unlock(); }
}

void updatePosition(String symbol, Position pos) {
    lock.writeLock().lock();
    try { state = state.withPosition(symbol, pos); }
    finally { lock.writeLock().unlock(); }
}

With Clojure’s atom (a reference to a persistent value):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(def risk-state
  (atom {:positions {}
         :limits    {}
         :prices    {}}))

;; Read — no lock, no coordination:
(defn get-position [symbol]
  (get-in @risk-state [:positions symbol]))

;; Write — atomic CAS:
(defn update-position [symbol position]
  (swap! risk-state assoc-in [:positions symbol] position))

@risk-state dereferences the atom — gets the current value. This is a lock-free read. Multiple goroutines can read concurrently with zero coordination.

swap! does a compare-and-swap: read the current value, apply the function to it, store the result if the value hasn’t changed since we read it. If it has changed (another writer updated concurrently), retry. This is contention-free under single-writer conditions.

The critical property: a reader that dereferences the atom gets a consistent snapshot — the entire state at a single point in time. Even if a writer updates the atom a microsecond later, the reader’s snapshot is unaffected. There’s no intermediate state, no partial update, no torn read.

The Actual Use Case: Risk Snapshot Distribution

The risk system needed to distribute a consistent view of risk state to ~20 downstream consumers (calculators, monitors, reporters). Each consumer might read the state at any time; the state updated as fills arrived.

With locks: consumers must hold a lock while reading, blocking writers. Writers must hold the lock while updating, blocking readers. Under moderate load, this becomes a bottleneck.

With persistent data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
;; Writer (single thread):
(defn process-fill [fill]
  (let [new-state (update-risk-state @risk-state fill)]
    (reset! risk-state new-state)
    ;; Broadcast the new snapshot reference to all consumers:
    (doseq [consumer @consumers]
      (>!! (:channel consumer) new-state))))

;; Consumer (each on its own goroutine):
(defn consume [consumer]
  (go-loop []
    (when-let [snapshot (<! (:channel consumer))]
      (process-snapshot snapshot)
      (recur))))

The writer creates a new snapshot value and sends the reference to each consumer via a core.async channel. No consumer holds a lock. Each consumer works on its own snapshot — which is not the “current” snapshot but is completely consistent. If the state changes while a consumer is processing, the consumer’s snapshot is unaffected.

The memory overhead: each snapshot shares structure with the previous one. For a state with 10,000 positions, a single fill update creates a new snapshot that differs from the previous one by about 30–50 nodes in the trie (the path from root to the changed leaf). The other 9,999 positions are structurally shared.

When This Works

Single writer, many readers. Persistent data structures shine when writes are rare and reads are frequent. Multiple writers cause CAS retries in swap! — under high write contention, you may want a different approach.

Moderate update rate. For thousands of updates per second, the allocation pressure from creating new versions matters. For hundreds of updates per second, it doesn’t.

Snapshot-consistent reads. If readers need to see a consistent view of the entire state (not just individual fields), persistent data structures provide this naturally. With mutable state, you’d need a read lock that covers the entire state.

Audit history. If you keep references to previous snapshots (in a log, for example), you get automatic immutable history. Each state version is a value you can compare, serialize, or replay.

Persistent data structures are a tradeoff. They’re slower per operation, allocate more, and require understanding the structural sharing model. In return they give you lock-free concurrent reads, snapshot consistency, and freedom from the class of bugs that come from shared mutable state. For the risk system — where correctness under concurrency was more important than raw throughput — they were the right choice.

What “Persistent” Actually Means#

The Performance Reality#

The Concurrency Model#

The Actual Use Case: Risk Snapshot Distribution#

When This Works#

What “Persistent” Actually Means

The Performance Reality

The Concurrency Model

The Actual Use Case: Risk Snapshot Distribution

When This Works