Discourse around AI tends to collapse into two camps: true believers and luddites. A recent piece, Agentic Coding is a Trap, highlights what the author calls the “paradox of supervision” - where the very judgment needed to oversee AI delegation is undermined by the habit of delegating to it. This is a signal that something is broken in how we are thinking about AI delegation. If blind delegation is clearly problematic, then we are missing a clear framework on when AI should be relied on. I think there are a few characteristics of systems that are key in determining how much AI should be used.
Sensitivity
What’s missing is a way to think about risk. I’d argue there is a useful continuum between High Sensitivity and Low Sensitivity systems and where a system falls on that spectrum should dictate how much you can rely on agentic workflows.
High sensitivity systems are usually long lived and have a high cost for errors that also compound. Things that belong in this category are: auth systems, billing systems, and distributed systems and so forth; Errors in these types of systems will likely have detrimental effects for dependent services. A drift in understanding (what others have coined Cognitive Debt) will also undermine the effective maintenance of such systems. Broadly, many complex SaaS products and large company products will occupy this category.
On the contrary, low sensitivity systems are very agent friendly. This is usually a consequence of either a low cost of errors, weak or no coupling to other systems, minimal long term consequences and the ability to be discarded or regenerated at will. It is in this category that we see the usual blog post about using AI to fully build some product or system. It feels compelling to use agentic workflows for brand new things as well, this is precisely because new things have no dependent systems.
Even in a high sensitivity environment, many tasks are low sensitivity. A mature SaaS platform may be a high sensitivity system over all, but generating an internal admin tool or scaffolding test fixtures seem like perfectly reasonable tasks to use agentic workflows with. It takes experience to understand what type of problem you are dealing with.
Here are some of the possible axes to which I judge sensitivity:
-
Blast Radius: This defines how badly things go wrong if something fails. For example, errors to an auth system might cause users to not be able to log in anymore. Similarly, errors to a billing system might entirely stop revenue from flowing in or other catastrophic issues.
-
Reversibility: How cheap is it to throw away the change and replace it? A change that requires a series of database migrations is much more complex to roll back than a simple change to page styling.
-
Coupling: How many systems are affected by the change? Do multiple services need to be updated to include the change? Will a rollback of the change cause a cascade of rollbacks to dependent services? While this dimension seems a little similar to reversibility, it captures a layer of complexity decided by the environment in which the change sits rather than the change itself.
-
Time Horizon: Will this change matter next week? How about next year? Changes that will be ripped out soon are not going to contribute to the complexity burden for very long and can be safely ignored.
-
Human Maintainability: Will humans need to interact or make changes to this code? In that case, the changes should be made to be human understandable; the abstractions should be carefully chosen so that humans working in the future can more easily contextualize it.
Consider Django Control Room, a framework for building internal tooling (called panels) inside of the Django admin. The project illustrates how sensitivity can vary within a system. The core plugin system (dj-control-room) exhibits high sensitivity; It defines the conventions that all panels depend on, architectural decisions there are hard to reverse and breaking changes ripple across every downstream panel/repo. Authoring individual panels, however, is a much lower sensitivity task. The blast radius of a bug in a panel is contained - developers would see degraded tooling, not a user facing outage. Each panel is an independent package that is removable with a single line in INSTALLED_APPS, making reversibility high. Coupling between panels is nonexistent by design and the framework’s conventions keep individual panel implementations consistent enough that each panel can be reasoned about and fixed or replaced in isolation. This is precisely the kind of task where heavy use of agentic coding makes sense, while the core framework itself demands more care.
The real challenge, then, is not deciding whether to use AI. It is correctly identifying what kind of problem is being solved and choosing an interaction model that matches the sensitivity of the system. The difficulty is that this classification itself requires judgment, and judgment is precisely the thing easiest to bypass when rapid output is available. The temptation to apply workflows that are effective in low sensitivity environments to high sensitivity systems will be strong, particularly because the immediate feedback appears positive. The consequences, however, may not become visible until much later, when the system must be understood, debugged, extended, or repaired under pressure.