Over two years running engineering hiring at the firm, we made about twenty senior engineering hires. Some were transformative — engineers who raised the quality of everything they touched. Some were neutral — competent contributors who did what was asked and not much more. A few were net negatives.

Looking back, the correlation between interview performance and outcome was weaker than I’d expected. The things that predicted good hires were visible in the interview, but we weren’t consistently measuring them.

What Our Interviews Measured Well

Technical fundamentals: systems design questions, coding exercises, architecture discussions. These correlated reasonably well with whether someone could do the core technical work.

Communication under pressure: a technical interview is a stressful situation where you have to think and explain simultaneously. Candidates who could communicate clearly under that pressure generally communicated well at work.

Speed of understanding: how quickly a candidate grasped a new concept or problem during the interview was a reasonable proxy for how quickly they’d pick up new contexts at work.

What Our Interviews Measured Poorly

Ownership and initiative: the most common thing that distinguished the transformative hires from the neutral ones wasn’t technical skill — it was whether they sought out problems to solve or waited to be assigned them. This is genuinely hard to measure in an interview. Candidates self-report high initiative; their behaviour at work may differ.

The best signal we found: asking for specific examples of past projects they initiated (not managed, not contributed to — initiated). “Tell me about a project you started because you saw a problem nobody else was prioritising.” The quality and specificity of the answer predicted ownership orientation better than any other question.

Collaboration style under disagreement: technical disagreements are common. The question isn’t whether someone is right — it’s whether they can disagree without it becoming personal or political. This is almost impossible to measure in an interview where everyone is on best behaviour.

Partial signal: how a candidate responds when you push back on something they said. Do they engage with the pushback, update their view, or dig in defensively? A 10-minute design discussion where you deliberately challenge their choices tells you something.

Judgment on scope and complexity: senior engineers who build systems that are bigger and more complex than the problem requires are a specific failure mode that’s hard to spot in interviews. The systems design interview rewards comprehensive, sophisticated designs. Production systems reward simple, maintainable ones.

Partial signal: ask “what would you cut from this design if you had half the time?” Candidates who find this easy, who immediately identify what’s core and what’s optimisation, tend to have good judgment in practice. Candidates who find the question uncomfortable or try to preserve everything often over-engineer in production.

The Reference Check We Started Taking Seriously

For the first couple of years, reference checks were a formality. Candidates provided glowing references; we confirmed employment and got positive reports. Useless.

We started doing two things differently:

Call people not on the reference list. If a candidate listed a manager from a previous job, call a peer or a skip-level manager from the same org who isn’t on the list. The candidate chose their references; we wanted unfiltered perspectives.

Ask about the second-order questions. Not “was this person a good engineer?” but “what would you tell me about their working style that their resume doesn’t capture?” and “what’s a situation where their approach created friction and how did they handle it?”

The reference check became the most predictive part of our process once we stopped using it as a box-ticking exercise.

The Things Worth Over-Indexing On

After making hires that didn’t work out and thinking carefully about why:

Does this person believe the work matters? Engineers who are genuinely interested in the problem domain — financial systems, data infrastructure, whatever the company does — do better work than engineers who are indifferent to the domain and are primarily interested in the technology. Curiosity about the problem is harder to fake in a long conversation than technical skill.

Have they shipped things that were used? Portfolio questions matter more than background questions. What have you built that was actually used by people? What happened when it was put under load? What would you change? Engineers who have shipped production systems have a groundedness that’s visible in how they talk about systems.

Can they explain what they don’t know? The best senior engineers I’ve hired are precise about the boundaries of their knowledge. “I know X well, I have some experience with Y, I’ve never worked with Z.” Candidates who claim expertise in everything either have unusually broad experience (possible) or are not reliably self-assessing (common).

What I’d Tell Someone About to Run Hiring

The interview process you have probably measures what’s easy to measure, not what matters most. Be honest about that gap.

The things that matter most for senior engineers — ownership, judgment, collaboration — require inference from behavior patterns over time, not direct measurement in a 45-minute session. The process can provide signal; it can’t provide certainty.

The other honest admission: the hire that seems risky (unconventional background, unusual technical choices) sometimes becomes the most transformative person on the team. Standardised processes are good at filtering out the worst candidates and mediocre at selecting the best ones. Some of the best hires require someone to make a judgment call against the process’s recommendation.