The testing pyramid still holds
Like many teams, my team is grappling with a new bottleneck now that agents are writing most of our code: CI. We recently discussed how to speed it up while keeping quality high, and we debated how best practices for testing should evolve in the agentic coding era.
For example: do unit tests still matter? Claude writes a function and writes a test. When another session updates the function, it updates the test in lockstep. Is this test actually useful?
The way software gets shipped is changing dramatically. As a result, every engineering team is facing questions about which of our software engineering fundamentals need to change. I think the testing pyramid still holds. How we interpret it and the intention we put into following it matters now more than ever.
Interpreting the pyramid
The testing pyramid is simple. Many fast and cheap tests, fewer slow and expensive tests, so you have guardrails against production breakages without CI taking forever. Crucially, this isn’t telling us to write endless brittle unit tests that are focused on implementation rather than intent.
This goes back to the classic Kent Beck argument: we get to define what an actually useful unit is. It's definitely not a private method, and it's not verifying getters and setters either. If that's how you define a unit, then yes, any small implementation change requires a corresponding test update, and the lockstep argument holds whether it's a human or an agent writing the code.
A useful unit is a more interesting public interface boundary. It's really whatever creates the right contract so that you're protected when code changes in a way that modifies the unit's intent. Pick your units well and the test doesn't churn every time an implementation detail shifts. It fails when behavior actually changes, which is exactly when you want it to.
Getting agents to follow it
Whatever your unit is, and this applies to many integration tests too, you need to be intentional about keeping it fast and cheap. Be comprehensive about edge cases, but mock the right interface boundaries and don't write to a test database when you don't need to. These are the same best practices software testing has always had. They haven't changed because an agent is writing the code.
But left without guidance, an agent can get this wrong in both directions, and it’s extra costly because the amount of tests being produced is increasing exponentially alongside the amount of code being produced. An agent might write tons of brittle unit tests without picking useful units, or it might write tests that are more expensive than they need to be because it's not mocking well. If your test suite is full of brittle tests that aren't protecting against production breakages, the problem isn't that the testing pyramid is outdated. It's that you're not making sure the agent interprets it correctly.
The simple fix is making sure your team’s best practices are present in the agent's context. A CLAUDE.md that spells out what a useful unit is, how to mock boundaries, and when a test actually needs a database goes a long way. Agents follow conventions pretty reliably when you make them explicit.
Structure helps too. If your unit test suite is physically separated from your integration suite, and database fixtures and network clients aren't even importable from the unit side, the agent can't accidentally reach for them. This way, the fast path is the only path. This kind of guardrail is cheap to set up and pays for itself every time an agent writes a test.
None of this is new advice. We've always known that good test suites require intentional design. What's changed is that the entity writing most of the tests now needs that design spelled out rather than absorbed through code review and team discussions. The pyramid still applies, we just have to be more intentional about teaching it to our agents.