Building MiFID II Trade Reporting Infrastructure: An Engineer's View

MiFID II went live on January 3, 2018. The preparation started in 2016. Two years for a set of regulatory requirements that, from the outside, looked straightforward: report each trade to a trade repository within 15 minutes of execution.

From the inside, “report each trade” requires answering: which trades? From which systems? In what format? To which trade repository? What constitutes a trade for the purposes of reporting vs. booking vs. settlement? What do you do when the reporting service is unavailable? What happens when the trade repository rejects a report?

This is the engineering story of building a system to answer those questions.

What MiFID II Actually Required

The regulation required reporting of over-the-counter (OTC) derivative trades to an Approved Reporting Mechanism (ARM) within 15 minutes of execution, with fields including counterparty identifiers, instrument details, price, quantity, timestamps, and a transaction reference ID (TVTIC) unique across all reporting entities.

Required fields (subset):
  - Transaction reference number (globally unique)
  - Trading date/time (UTC, microsecond precision)
  - Instrument identifier (ISIN)
  - Price / price currency
  - Quantity
  - Counterparty LEI (Legal Entity Identifier)
  - Trader ID (if applicable)
  - Venue of execution
  - Capacity (principal, agent, matched principal)

The precision requirements were the first surprise: microsecond timestamps on trade events, when most of our internal systems stored timestamps with millisecond or second precision. This was not a reporting problem — it required changes to the trade capture systems upstream.

The Architecture We Landed On

Trade sources (10+ systems)
    │
    ▼
Kafka topics (one per source system)
    │
    ▼
Trade normalisation service
  - Maps source-specific formats to canonical trade model
  - Enriches with static data (ISIN lookup, LEI lookup)
  - Validates completeness and field constraints
    │
    ├── VALID trades ──────────────────────────────────────────┐
    │                                                          ▼
    │                                              Reporting gateway
    │                                                - Rate limiting
    │                                                - Retry with backoff
    │                                                - ARM submission
    │                                                          │
    └── INVALID trades ──────────────────→ Dead letter queue   │
        (missing fields, lookup failures)   + alerting         │
                                                               ▼
                                                       ARM response
                                                   (accepted / rejected)
                                                               │
                                              ┌────────────────┘
                                              │
                                    Rejection handling service
                                    - Parse rejection reason
                                    - Route to correction workflow
                                    - Track SLA compliance

Kafka was the right backbone: it decoupled the trade sources (which we didn’t control) from our processing pipeline, gave us replay capability for reprocessing after bugs, and made it natural to add new consumers (the compliance team added a separate analytics consumer without touching the reporting pipeline).

The 15-Minute SLA

The 15-minute window sounds generous. In practice:

Trade execution                            0:00
Internal trade event published to Kafka    0:00 + 50ms
Trade consumed by normalisation service    0:00 + 100ms
Static data lookup (ISIN, LEI)            0:00 + 150ms
Validation                                0:00 + 200ms
Submitted to ARM                          0:00 + 500ms
ARM acknowledgment received               0:00 + 2s
ARM rejection (if any) received           0:00 + 10s

Under normal conditions, we were reporting within 2–3 seconds. The 15-minute window was there to handle retries, ARM unavailability, and correction workflows when reports were rejected.

The cases that consumed that window:

1. ARM unavailability: The ARM was an external service with its own SLA. On days with high market activity, the ARM would throttle or temporarily reject submissions. We needed a retry queue that tracked submission time and ensured retried reports still met the 15-minute deadline.

2. Rejection handling: ARM rejections required human review for some categories (incorrect instrument identifier, ambiguous counterparty), or automated correction for others (missing but derivable fields). The correction workflow needed to complete and resubmit within the window.

3. Static data delays: ISIN lookups against the reference data service occasionally timed out when the data wasn’t cached. The normalisation service retried with exponential backoff, which could take minutes if the reference data system was slow.

The Transaction Reference Number Problem

Each MiFID II report required a Transaction Reference Number (TRN) — a globally unique identifier for the transaction, consistent across all amendments and cancellations of the same trade.

The constraint: if we reported a trade, then amended it, then cancelled it, all three reports needed the same TRN. The TRN had to be stable across our entire processing pipeline and surviving across service restarts.

We couldn’t use database auto-increment (we had no single database that saw all trades). We couldn’t use UUIDs naïvely (they weren’t stable across reprocessing). We needed deterministic generation: given the same trade inputs, always produce the same TRN.

Solution: SHA-256(source_system + source_trade_id + version) truncated to fit the field width, base36-encoded. The same trade ID from the same source system always produced the same TRN. Amendments used version=1, version=2, etc. Cancellations used version=CANCEL.

This had one failure mode we discovered in testing: two different source systems could have overlapping trade ID spaces. We added the source system prefix to the hash input after finding a collision in the test environment.

Exactly-Once Processing and Idempotency

The reporting pipeline needed to process each trade exactly once — submitting duplicate reports to the ARM was a regulatory violation, not just a nuisance. Kafka’s at-least-once delivery meant duplicates were possible.

The idempotency layer:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// Before submitting to ARM:
boolean alreadyReported = reportingState.markReported(trn);
if (alreadyReported) {
    log.warn("Duplicate submission attempt for TRN {}, skipping", trn);
    return;
}
try {
    arm.submit(report);
} catch (Exception e) {
    reportingState.unmarkReported(trn);  // allow retry
    throw e;
}

reportingState was backed by Redis with a TTL of 30 days — long enough to prevent duplicates from replays, short enough not to grow unboundedly. The TRN was the deduplication key.

This gave us at-least-once ARM submission semantics with duplicate detection at the application level — functionally equivalent to exactly-once for our purposes, since the ARM itself was idempotent on the TRN field.

What Actually Broke

Reference data service availability: Our ISIN and LEI lookups depended on an internal reference data service that had different SLAs from our reporting pipeline. When that service was slow, our 2-second reporting path became a 90-second reporting path. Fix: add a local cache with stale-while-revalidate semantics and fallback to a direct database query on cache miss.

Clock synchronisation: MiFID II required UTC timestamps with microsecond precision. Several source systems used local server time without NTP, meaning trade timestamps were up to 15 seconds off. We couldn’t correct the timestamps (that would be falsifying the report) but we had to flag them. Added a clock skew detector that compared trade timestamps against the reporting service’s NTP-synchronised clock.

Trade event schema drift: Source systems occasionally changed their Kafka message format without notifying our team. The normalisation service failed silently on unknown fields. Added schema versioning via Confluent Schema Registry — format changes required schema evolution that we approved before deployment.

ARM capacity during market open: The first 30 minutes after market open generated more trade events than the rest of the day combined. The ARM’s rate limit was expressed in reports/minute, and we hit it on several mornings. Added adaptive rate limiting with a token bucket that consumed from the ARM’s published rate limit.

The go-live in January 2018 was uneventful from a regulatory perspective. The architecture held. What I took away: regulatory requirements produce systems that are a lot more like distributed databases than they appear — you’re essentially building a reliable, exactly-once event pipeline with audit trails, SLA tracking, and external API integration. The “just report the trade” framing hides most of the real complexity.

What MiFID II Actually Required#

The Architecture We Landed On#

The 15-Minute SLA#

The Transaction Reference Number Problem#

Exactly-Once Processing and Idempotency#

What Actually Broke#