Platform Engineering Is a Product Problem, Not a Technology Problem

The team I joined was called “platform engineering.” The mandate was loosely: make other engineering teams faster. The first three months were spent understanding what “faster” meant, which was more contentious than expected.

The Classic Platform Engineering Trap

Platform teams typically form because someone notices that every product team is solving the same infrastructure problems independently — logging, tracing, CI pipelines, deployment, secrets management. The natural response: centralise. Build it once, share it.

The trap: building something you’d want to use rather than something the product teams need. I’ve seen this in almost every platform organisation I’ve observed. The platform team is technically strong, has opinions about the Right Way to do things, and builds to those opinions. The product teams get something that technically solves their problem but requires them to change their workflows significantly to adopt it. Adoption is low. Platform team wonders why.

The fix is treating the internal developer platform as a product with real users, measured by adoption and time-to-productivity, not by whether it exists.

Applying Product Thinking to Internal Tooling

What does “product thinking” mean concretely for an internal platform?

User research. We spent four weeks interviewing teams about their actual friction points before writing any code. The answers were surprising: the biggest friction was not the CI pipeline or the deployment tooling (which we’d assumed). It was local development environment setup (taking 2–4 days for a new service) and service-to-service authentication (every team had reinvented it differently).

Adoption as the success metric. The standard infrastructure metric is uptime/reliability. Necessary, but not sufficient. We added: what percentage of new services use the platform’s auth library? What’s the p90 time from “new engineer joins” to “first commit deployed”? These metrics are uncomfortable because they expose whether anyone is actually using what you build.

Reducing the cost of the first win. A platform feature that requires a week of migration work before paying off will be deprioritised by every product team against their feature backlog. Features that provide value on day one — “add this library, get distributed tracing automatically” — compound much faster.

The Internal Developer Portal

After the user research, we prioritised:

A service template with auth, logging, tracing, and health checks pre-configured — new services start from the template, not from scratch. (Time to deploy first new service: 4 hours instead of 2 days)
A self-service secret rotation UI instead of a ticket-based process — previously 2–5 day turnaround, became self-serve in minutes
Standardised environment variable names and a local development harness that replicated the production environment locally

The template was the highest-leverage item. Every new service got observability and security defaults correct from day one, without each team reinventing or forgetting them. Six months after launching it, 100% of new services used it (because it was the path of least resistance), and the auth library adoption had propagated backwards into most existing services via incremental migrations.

The Governance Balance

Platform teams face a genuine tension: standardisation improves reliability and reduces cognitive load, but excessive standardisation stifles the autonomy that makes product teams effective.

The resolution we landed on: strong defaults, weak mandates. The platform provides high-quality defaults. Teams can deviate with a written rationale. Deviations are tracked but not blocked. In practice, deviation is rare when the defaults are good — people deviate when they’re blocked, not when they’re opinionated.

The things we made mandatory (not just default): security controls (auth, encryption), audit logging for regulated data, incident runbooks in a standard format. These were non-negotiable because their failure modes affected the firm, not just the individual team.

What I’d Do Differently

Start the user research earlier and more formally. We had intuitions about what teams needed, and those intuitions were wrong in ways that a few structured interviews would have revealed in week one.

Invest in documentation before features. Our best-adopted features had clear “getting started in five minutes” docs. Our least-adopted features had comprehensive reference documentation and no narrative guide. The audience for a platform is engineers who are already busy — they optimise for understanding quickly, not for completeness.

Measure adoption before declaring success. Features that exist but aren’t used are not features; they’re maintenance burden. We killed two features in the second year that had low adoption despite being technically solid. The engineering time saved by killing them (no more support, no more documentation maintenance) was non-trivial.

The Classic Platform Engineering Trap#

Applying Product Thinking to Internal Tooling#

The Internal Developer Portal#

The Governance Balance#

What I’d Do Differently#

The Classic Platform Engineering Trap

Applying Product Thinking to Internal Tooling

The Internal Developer Portal

The Governance Balance

What I’d Do Differently