The fintech startup hit the platform question about eighteen months in. We had product-market fit, we were growing, and the engineering team was doubling every six months. The systems that had worked at ten engineers were showing strain at twenty-five. The question became: dedicate engineering time to platform work, or keep all capacity on product features?
This is a hard question. The people who get it right aren’t smarter — they’re clearer about what they’re actually trading off.
What “Platform” Actually Means
Platform work is any work that makes the next feature faster and safer to build. It includes:
- Shared services (auth, notification, billing) that product teams use instead of building their own
- Infrastructure tooling (deployment pipelines, monitoring, local dev setup)
- Developer experience improvements (test infrastructure, documentation, onboarding)
- Technical debt reduction that was blocking velocity
Product work is work that directly delivers value to customers: new features, improvements to existing features, bug fixes that affect users.
The line is blurry. A better alerting system is platform work, but if it prevents a customer-facing outage, it’s also product work. A data pipeline is platform work until you realise it’s also the thing that makes the analytics feature possible.
Why Startups Under-invest in Platform
The incentive structure pushes against platform investment:
Visibility is asymmetric. A new feature ships and you can point at it. A deployment pipeline improvement means the next feature ships 20% faster — but that counterfactual is invisible.
The pain isn’t distributed equally. When platform work is deferred, senior engineers absorb the cost. They know the shortcuts, they understand the fragility, they debug the failures. They’re slower than they could be, but they manage. Junior engineers can’t; the system looks fine until it’s time to hire and onboard.
Product managers are measured on product. A product manager measured on shipped features has no incentive to prioritise work that makes the engineering team faster. The incentive alignment problem is structural.
The problem is future-tense. “If we don’t fix the deployment pipeline, future features will be slower” doesn’t have the urgency of “customers can’t do X right now.”
Why Startups Over-invest in Platform
It happens too, especially after a bad incident or when a strong platform engineer joins.
Premature abstraction. A service that handles one use case doesn’t need to be a platform. Building the shared auth service before you have three services that need it is a waste. Build the abstraction when you have the third use case, not the first.
Yak shaving. Platform work can expand indefinitely. Improving the deployment pipeline is fine; rewriting the deployment pipeline because the current one has imperfect local dev support when local dev is working is yak shaving. Know when to stop.
Platform as a destination rather than an enabler. Some engineers find platform work intrinsically interesting and can lose track of why it exists. The goal is to ship better products faster, not to have an impressive platform.
The “big migration” that never ends. Migrating all services to a new framework, or all state to a new database, sounds like productive platform work and is sometimes necessary. But it often runs years past estimate, consuming capacity that could have shipped product, while the target architecture shifts under the team.
The Framework That Helped
We used a simple heuristic: platform investment is justified when the cost of deferral compounds faster than the cost of the investment.
Concretely:
- The deployment pipeline took 20 minutes and was flaky. Every engineer lost 20–40 minutes per day waiting for it or debugging it. At 20 engineers, that was 7–14 engineer-hours/day. A 3-week investment to fix it would pay back in 3–6 weeks.
- The auth service was inconsistent across three services. Each service had slightly different session handling. This created 2–3 bugs per quarter and security review friction. A 4-week consolidation would eliminate a class of bugs and reduce audit time.
- The service mesh migration was an architectural preference. No service was meaningfully constrained by its absence. Indefinitely deferred.
The calculation isn’t always that clean, but the forcing question is: what is this costing us today, and what is the trajectory? Costs that compound (slower onboarding, growing bug class, escalating incident rate) justify investment. Costs that are static (“this is annoying”) don’t.
The Team Structure That Followed
At around 25 engineers, we formalised a small platform team — three engineers who owned shared infrastructure, the deployment pipeline, observability, and developer experience. Their mandate was explicitly tied to product team velocity metrics: time from code to production, time from incident to resolution, time to onboard a new engineer to productivity.
This resolved the incentive problem. Platform engineers were measured on whether product engineers were faster, not on how sophisticated the platform was. The product teams were their customers.
Two things that made this work:
Embedded rotations. Platform engineers spent one week per quarter embedded in a product team. This kept platform work grounded in actual pain. Without this, platform engineers build things they think product teams want; with it, they build things product teams actually need.
Explicit SLAs for platform services. Shared services had defined reliability targets and on-call ownership. Product teams could depend on them. Before explicit ownership, “shared” usually meant “nobody’s problem.”
The Right Balance
At the fintech startup, we eventually settled at roughly 15–20% of engineering capacity on platform/infrastructure work, ramping to 30% when a specific pain was severe enough to justify it.
That number is not universal. A startup at 5 engineers should have almost zero dedicated platform capacity — everyone builds product, platforms are whatever you can do in spare moments. A team at 100 engineers probably needs 20–25% capacity on platform to keep velocity from degrading.
The number matters less than the clarity about what you’re trading off. “We’re not fixing the deployment pipeline this quarter because the feature roadmap has higher expected value” is a valid decision. “We’re not fixing the deployment pipeline because nobody’s pointed at it” is not — it’s just deferred cost accumulating interest.
The platform vs product tension doesn’t resolve. It’s a continuous calibration between building for today and building for the next six months. The calibration gets harder as the team grows. The best teams aren’t the ones with the best platforms — they’re the ones that have the clearest view of what their platform work is buying them.