Teaching AI To Test
Hey folks, for the past couple of months I’ve been discussing how we as developers can enter the QA mindset. For this next series, I’m delving into the world of artificial intelligence to talk about how we can help agents be better testers.
Let’s start with an obvious questions, why do we even want to do this? After all, with a little prompting an agent is more than happy to write and run unit tests for every change it makes. Do we really need to invest in any more capabilities than that?
The unfortunate reality is that unit tests are insufficient to cover the vast majority of production-grade software. Unit tests are fast and stable, which is why they serve as the foundation of the test automation pyramid. However, they are inherently limited in scope. If you have a micro-services architecture with high unit test coverage and low integration test coverage, then you likely won’t encounter many bugs within a single service, but you will be plagued with outages as services shift underneath each other. As any QA engineer will tell you, then only way to be certain the entire app will function is to test it as a whole.
A lack of effective testing often leads to reduced productivity. What I personally have observed is the phenomenon where AI-assisted development results in code being written quicker, but the pull requests for that code receive more pushback and ultimately take longer to land. In order to maintain a high standard of quality and achieve the promise of higher productivity through AI-accelerated development, we have a few options:
Hire more QA engineers to test that code.
Shift our developer workforce to be more testing focused.
Have the agent writing the code perform better testing of its work.
Plenty of discussion could take place around all three of those options, but for the purposes of this series I’ll focus on the practical steps to improve how an agent tests its code.
Now that we have an understanding of why we would want to impart better testing skills to agents, let’s quickly talk about why we even need to be the ones providing those instructions. After all, an agent fundamentally understands how to build an application, so aren’t Anthropic and OpenAI giving these models everything they need? What I’ve found is that the hardest part of enabling AI to test its own changes is not telling it how to test, it’s telling it how to test my app. Claude fundamentally understands what a feature flag is, but it doesn’t know about my admin dashboard for managing feature flags in the current session. We as developers have built tools and learned tricks that make testing much more efficient. Preloading that knowledge into an agent is essential if we want it to move as fast as a human.
In the next few posts I’ll walk through practical steps for teaching an agent about the intricacies of testing a given application, tips for organizing those learnings to be AI-ready, and the pitfalls of leveraging a machine as a substitute for human testing.
