Engineering at Enterprise Scale: What Changes When the System Is Actually Big

I’d worked at organisations ranging from twelve people to four hundred. The new role is at a company with tens of thousands of engineers. The systems are bigger, the coordination surface is larger, and some things I assumed were universal engineering truths turned out to be scale-specific.

The Technical Delta

At every previous employer, a “large dataset” was in the tens of billions of rows and fit on a well-specced server with some careful engineering. The datasets I work with now are larger by multiple orders of magnitude. This isn’t just a quantitative difference — it’s qualitative. Engineering patterns that work at billions of rows are sometimes actively wrong at trillions.

Some specific things that change:

Index design becomes a cross-team negotiation. At a startup, you add an index and deploy. At this scale, index adds on active tables require coordinated maintenance windows or careful online DDL strategy, affect storage budgets that are tracked across teams, and may shift query plans in tables shared by other services. The technical decision becomes a process.

Caching isn’t an optimisation, it’s load management. At startup scale, caching is nice to have. At this scale, without aggressive caching, many underlying data stores can’t absorb the read traffic from all the services hitting them. Cache design — TTLs, invalidation strategy, cache coherency guarantees — becomes a first-class architectural concern with reliability implications.

Distributed tracing is oxygen, not a luxury. A request at this scale touches fifteen to forty services before returning. Without distributed tracing, debugging even simple latency regressions is archaeology. At the startup, tracing was useful. Here, the absence of tracing for a new service is a production incident waiting to happen.

The Organisational Delta

The coordination surface is different in kind, not just degree.

At four hundred engineers, you can know most people in your domain by name. You share enough context that a Slack message and a thirty-minute conversation can align two teams on a technical approach. Informal coordination works.

At this scale, informal coordination doesn’t reach. Technical decisions that affect multiple teams require written RFCs, formal review, stakeholder lists that span multiple organisations. The writing quality matters because the audience is people who don’t share your context and can’t ask follow-up questions.

This is not bureaucracy for its own sake. It’s the minimum overhead required to make a decision that survives contact with the people it affects. I’ve watched carefully crafted technical designs fail in deployment because a downstream team’s constraint wasn’t considered — a constraint that would have been obvious in a smaller organisation and was invisible in a larger one without explicit stakeholder analysis.

The Pace Paradox

Large organisations have a reputation for slowness. In my experience, it’s more complicated:

Individual feature work moves at similar speed. The code review → test → deploy cycle, for work within a well-defined service, is comparable to a startup with reasonable process.

Cross-team work moves much slower. Anything requiring coordination — API changes, shared library upgrades, infrastructure migrations — has a coordination overhead that compounds with the number of teams involved. A migration affecting ten teams has ten independent quarterly roadmaps to navigate.

The work that exists at this scale doesn’t exist at smaller ones. There are entire engineering domains — internal tooling, infrastructure reliability, developer productivity — where the scale of the problem is only created by operating at this size. The interesting work is different, not absent.

What Carries From Smaller Organisations

The skills that transfer directly: systems thinking, understanding performance from first principles, the ability to write clearly about technical decisions, debugging instincts built from years of production incidents.

The skills that need recalibration: scope estimation (larger systems have more surface area for unexpected interactions), stakeholder management (more of it, more formally), and the willingness to slow down before acting in domains where the blast radius is significant.

The mental model I’ve found useful: treat the organisation itself as a distributed system. It has the same failure modes — lack of shared state, coordination overhead, inconsistent views of the world — and benefits from the same approaches: explicit interfaces between components, documented invariants, mechanisms for propagating state changes. An RFC is essentially a protocol spec for a decision. A cross-team working group is a consensus mechanism.

It’s not a perfect analogy. But it makes the environment easier to navigate than approaching it as “a startup, but bureaucratic.”

The Technical Delta#

The Organisational Delta#

The Pace Paradox#

What Carries From Smaller Organisations#

The Technical Delta

The Organisational Delta

The Pace Paradox

What Carries From Smaller Organisations