Not All Software Systems Are Agent Friendly

Discourse around AI tends to collapse into two camps: true believers and luddites. A recent piece, Agentic Coding is a Trap, highlights what the author calls the “paradox of supervision” - where the very judgment needed to oversee AI delegation is undermined by the habit of delegating to it. This is a signal that something is broken in how we are thinking about AI delegation. If blind delegation is clearly problematic, then we are missing a clear framework on when AI should be relied on. I think there are a few characteristics of systems that are key in determining how much AI should be used.

Sensitivity

What’s missing is a way to think about risk. I’d argue there is a useful continuum between High Sensitivity and Low Sensitivity systems and where a system falls on that spectrum should dictate how much you can rely on agentic workflows.

High sensitivity systems are usually long lived and have a high cost for errors that also compound. Things that belong in this category are: auth systems, billing systems, and distributed systems and so forth; Errors in these types of systems will likely have detrimental effects for dependent services. A drift in understanding (what others have coined Cognitive Debt) will also undermine the effective maintenance of such systems. Broadly, many complex SaaS products and large company products will occupy this category.

A great example of a high sensitivity task is implementing multi-tenancy support into a SaaS product. There are many ways to do this and all of them bind you to architectural paths that are difficult to rollback from. This is not a light decision and implementation is something that should deeply involve a human since it will dictate so much about the product.

On the contrary, low sensitivity systems are very agent friendly. This is usually a consequence of either a low cost of errors, weak or no coupling to other systems, minimal long term consequences and the ability to be discarded or regenerated at will. It is in this category that we see the usual blog post about using AI to fully build some product or system. It feels compelling to use agentic workflows for brand new things as well, this is precisely because new things have no dependent systems.

Even in a high sensitivity environment, many tasks are low sensitivity. A mature SaaS platform may be a high sensitivity system over all, but generating an internal admin tool or scaffolding test fixtures seem like perfectly reasonable tasks to use agentic workflows with. It takes experience to understand what type of problem you are dealing with.

The Hugo blog theme that this blog is running on is an example of a low sensitivity system. I use LLMs to routinely add features (such as the ability to create this aside) to this theme. While other people also depend on this theme, nothing truly critical on the internet is going to depend on it. Furthermore, the features for this blog can usually be verified visually and I can easily rollback anything immediately. Any new features I build into the theme are built within the Hugo framework, which is limited in capability and targeting toward templating and writing blogs.

Here are some of the possible axes to which I judge sensitivity:

Blast Radius: This defines how badly things go wrong if something fails. For example, errors to an auth system might cause users to not be able to log in anymore. Similarly, errors to a billing system might entirely stop revenue from flowing in or other catastrophic issues.
Reversibility: How cheap is it to throw away the change and replace it? A change that requires a series of database migrations is much more complex to roll back than a simple change to page styling.
Coupling: How many systems are affected by the change? Do multiple services need to be updated to include the change? Will a rollback of the change cause a cascade of rollbacks to dependent services? While this dimension seems a little similar to reversibility, it captures a layer of complexity decided by the environment in which the change sits rather than the change itself.
Time Horizon: Will this change matter next week? How about next year? Changes that will be ripped out soon are not going to contribute to the complexity burden for very long and can be safely ignored.
Human Maintainability: Will humans need to interact or make changes to this code? In that case, the changes should be made to be human understandable; the abstractions should be carefully chosen so that humans working in the future can more easily contextualize it.

Consider Django Control Room, a framework for building internal tooling (called panels) inside of the Django admin. The project illustrates how sensitivity can vary within a system. The core plugin system (dj-control-room) exhibits high sensitivity; It defines the conventions that all panels depend on, architectural decisions there are hard to reverse and breaking changes ripple across every downstream panel/repo. Authoring individual panels, however, is a much lower sensitivity task. The blast radius of a bug in a panel is contained - developers would see degraded tooling, not a user facing outage. Each panel is an independent package that is removable with a single line in INSTALLED_APPS, making reversibility high. Coupling between panels is nonexistent by design and the framework’s conventions keep individual panel implementations consistent enough that each panel can be reasoned about and fixed or replaced in isolation. This is precisely the kind of task where heavy use of agentic coding makes sense, while the core framework itself demands more care.

System sensitivity is also not static. A panel that starts as low stakes can quietly cross into high sensitivity territory as more people come to depend on it. Revising how you view the system is part of the discipline.

The real challenge, then, is not deciding whether to use AI. It is correctly identifying what kind of problem is being solved and choosing an interaction model that matches the sensitivity of the system. The difficulty is that this classification itself requires judgment, and judgment is precisely the thing easiest to bypass when rapid output is available. The temptation to apply workflows that are effective in low sensitivity environments to high sensitivity systems will be strong, particularly because the immediate feedback appears positive. The consequences, however, may not become visible until much later, when the system must be understood, debugged, extended, or repaired under pressure.