Startup

Kafka at Startup Scale

The fintech startup adopted Kafka early — we were processing market events at rates that would have overwhelmed any request-response queue. Two years in, with a five-broker cluster handling 200k messages/sec at peak, the operational experience was significantly different from what I’d expected based on the documentation and conference talks. ...

Engineering Roadmaps: Planning for Uncertainty

Every engineering team has a roadmap. Most engineering roadmaps are wrong. Not in a surprising way — in a predictable way that reflects structural problems in how they’re created. After doing roadmap planning at two companies at different scales, here’s what I think actually works and why most roadmaps fail. ...

Engineering Velocity at a Startup: What Actually Made Us Fast

The standard startup narrative is that small teams move fast because they cut process. No PRD approval chains, no design committee sign-off, no six-week delivery timelines. Just engineers and a product idea, shipping. That narrative is true as far as it goes, and incomplete in important ways. The startup I joined from 2019 to 2021 was fast for reasons that went beyond “we skipped the bureaucracy.” Understanding those reasons changed how I think about engineering productivity in any context. ...

The Platform vs Product Tension in a Growing Startup

The fintech startup hit the platform question about eighteen months in. We had product-market fit, we were growing, and the engineering team was doubling every six months. The systems that had worked at ten engineers were showing strain at twenty-five. The question became: dedicate engineering time to platform work, or keep all capacity on product features? This is a hard question. The people who get it right aren’t smarter — they’re clearer about what they’re actually trading off. ...

Building the First Production Service at a Startup: Decisions Under Uncertainty

Three months into the startup, the prototype was working and investors were asking for a production timeline. We had a Postgres database, a Python script doing the core business logic, and no infrastructure to speak of. The decision: rewrite in Go, build proper infrastructure, or ship the Python and iterate? And if we rewrite, what does “proper infrastructure” mean when you have six engineers and four months of runway? ...

Why I Left: On Risk, Pace, and Ownership

I left a well-paying, intellectually interesting job at a large financial institution to join a company with fewer than twenty people and less than a year of runway. My colleagues thought I’d lost perspective. My family was diplomatically concerned. Here’s the honest version of the reasoning. ...

Why Your Java App Is Slow Before It Even Starts: Classloading Deep Dive

The service started in 3 seconds in development, but the first live trade after deployment took 800ms instead of the expected sub-10ms. The second trade was fine. We couldn’t reproduce it in load tests. The culprit was classloading. The trade execution path touched 47 classes that had never been loaded before. Loading them, verifying the bytecode, and running static initialisers took 800ms — once, at first use. Understanding exactly how that happens is worth the time. ...