The AI Engineering Baseline

As global AI adoption has accelerated through the first quarter of 2026, it’s become clearer where the baseline bets that you need to make as a modern engineering organisation need to be on the curve of AI adoption in your day-to-day workflows.

It’s no longer a question of “can these tools work” (they can and do) – software professionals using AI tooling is the new normal - the question now moves to be “what are the most effective ways we can use this new category of tooling to speed up where we need to go”.

Through a purely technical lens, our goals are clear – we want to build increasingly stable, resilient systems, reduce complexity at every opportunity, and ensure that the platforms we build can operate at a competitive cost in the market.

These are not our only software goals (they exclude product direction; they don’t speak to meeting the future of the technology landscape) but as we consider what our baseline set of AI capabilities should be across our systems they are the bedrock on which we build.

The current state of the art in AI models and tooling is accelerating our ability to reason about complicated distributed systems as one cohesive whole, and this document outlines techniques, and subsequently highlights a direction of travel to take advantage of this change in the technology landscape.

The Four Patterns

There are four patterns of behaviour that makes sense in the building of software

  • AI in Development Workflows
  • AI Introspection
  • Agent Assisted Synchronous Change
  • Observing Super-Agent

And each has a distinct different set of trade-offs associated with its use.

AI in Development Workflows

AI in Development Workflows is the most familiar AI adoption path. Almost everyone is familiar with this by Q2 2026 – it involves a technical user, using tools that are either embedded in their IDE, or called via their Terminal to generate source code that is integrated into the software that they build. This practice itself is both rapidly evolving and encompasses a set of ancillary tools – things like MCP (Model Context Protocol) servers, agent skills, agent teams, using agents to do analysis, refinement, and iteration, among other co-design and co-development practices.

Almost all developers are familiar to some extend with this application of AI, and the rapid pace it’s been changing in recent months, but it’s characterised as cantered on the developer experience and almost always involves a human in the loop.

AI Introspection

AI Introspection is an emerging category of tooling both in the marketplace, and in our own novel experimentation where models are used to augment traditional static analysis of software. Where traditional static analysis focused solely on using programming tools like type checkers and linters to provide insights into software quality, generative models are allowing us to build on software introspection to also check for adherence to patterns and standards, evaluate quality, produce threat models, and perform the kind of analysis that was previously too difficult with more rigid tools.

Examples of good AI Introspection include answering questions like “do our APIs meet our API standard documentation”, “does this service follow any of our catalogued design patterns”, or “what potential attack vectors exist in the source code for this service”.

Layered on top of this, and combined with traditional static analysis, we’re able to use AI models to map out dependencies between distributed components, understand how changes are made using signals like git commit history, and generally reason about an extensive distributed system as a whole, producing meaningful reports that explains how a large platform changes in near real-time.

This blend of techniques is a generational leap over previous types of tooling in this space like SonarQube.

Agent Assisted Synchronous Change

Over the last two decades, there has been a drift towards increasingly “decomposed” and “decoupled” system designs. This is most observed as organisations trended first towards “Service Oriented Architecture”, and subsequently (via Guerilla SOA) towards “Microservices”.

This drift was driven by two factors – the increasing complexity of systems causing systems to grow large, and the appetite for organisations to parallelise work across these increasingly large systems. Software has a fixed “surface area” – a maximum amount of people that can meaningfully contribute to it before human coordination becomes unwieldy, so the drive towards microservices accelerated the amount of decomposition present in most systems.

Unfortunately, if you take this design philosophy to its natural extreme, whilst the software now has more “seams” in it, more subdivisions where ownership can be established between different groups of people, the total cognitive overhead required to comprehend the system and it’s behaviours as a whole exponentially increased, and the cost of servicing and maintaining and individual parts of the system increased to match. This has taken us to a place where lots of organisations now author vast distributed systems that are hard to comprehend from any one point, and making changes in one place often has adverse effects on several other components.

This antipattern is commonly referred to as a “distributed monolith” and is the most common distributed system accidental design, where changes are coupled across team and ownership boundaries.

Modern AI tooling is a salve to this problem – as using multiple iterations, models can comprehend all the components of a system and help synchronise change. To bring this to life “Agent Assisted Synchronous Change” refers to building tools that can reason about the whole system and sequence complimentary changes that are authored and committed across an estate, removing the traditional human overheads in doing this kind of split-ownership joined up work.

As an example, “upgrade all of our systems to the latest version of Framework X”, “change this contract in API A, and update all the other systems that call it to understand the contract change”.

Observing Super-Agent

The more future looking of the patterns in the Observing Super-Agent. “Super-Agent” is an emerging term in the space to describe an AI “director” that observes regular SDLC behaviours like story refinement, spec writing, and ticket authoring in tools like JIRA, and automatically dispatches AI agents to participate in the work, unsupervised until they return for feedback.

Tools have been trending in this direction for the last 3-6 months with features like “Assign to Copilot” in GitHub.com, and the Linear’s work tracking system where its unique sales pitch is “write tickets for agents to work on here”. The final form of this kind of solution is currently a vague, but there are several projects already pitching to own this space such as Open AI’s Symphony project (https://openai.com/index/open-source-codex-orchestration-symphony/), ByteDance’s DeerFlow (https://github.com/bytedance/deer-flow) which pitches itself as “an open-source super-agent harness”, and other similar projects.

It’s safe to speculate that whatever emerges in this space is going to be an orchestrating process that observes the tools you already use today, to dispatch jobs and have agents interact with the workflows that already happen in the organisation.

These super-agent harnesses are likely the engine that allows traditionally non-technical members of staff to be more involved in the build process products.

Making bets in this space

AI in Development Workflows

Teams should probably not look to build one thing in this area, instead, people should be incentivised and encouraged to use model assisted development and AI in their day-to-day tasks via GitHub Copilot subscriptions, Claude Code, and other commodity harnesses.

The state of the art in tooling here changes rapidly, instead people should just focus on the effective use of tools as they’re available today – this includes things like custom skills, custom agents, and MCP servers.

You should accept that there may be pockets of re-work in this space due to the pace of change, the cost of generating tools on an ad-hoc basis is currently very low, so that it makes sense to encourage teams to experiment.

It’s very likely that for the vast majority of developers, they’ll end up building their own tools in the field that’s evolving to be called “harness engineering”.

AI Introspection

There are emerging products in the space that support automated AI Introspection.

I’ve had positive experiences with SPAN (https://www.span.app/) – which provides development insights based on the flow of commits and work across our systems. It’s intended to be a single pane of glass for analysing “developer experience” and understanding where we spend our time and money with a product focus.

Similarly, CodeScene (https://codescene.com/) is a well-established product in this space that is augmenting its traditional static analysis tooling with MCP for AI workflows.

That said, I think there’s a large, untapped potential that we’re personally investing in with the teams I work with to build custom introspection tools that blend model driven assessment and traditional static analysis techniques to provide actionable quality insights across a software estate.

We’ve been investing in tooling that clones all the source code in our estate into a central location then runs analysis jobs and reports over the gestalt (gestalt: “a theory of perception that emphasizes the processing of entire patterns and configurations, and not merely individual components.”) – the sum of all the software together.

We’ve been using this technique to generate service maps of connection edges, PCI compliance reports, and generally to “analyse all the software, then analyse the connections between the software, and then report on the software as a whole”.

What’s interesting about this category of techniques, is that due to every estate being different, the best tool might not be one you buy, but one you build that can connect to the kinds of internal data source you’d never share with a third party.

I published the specification for our “System Map Builder” here - https://gist.github.com/davidwhitney/b278658398c8f54527815f79944ab4ef - which was part of this imitative.

Agent Assisted Synchronous Change

I am convinced you should be betting on this now.

One of the step changes in removing and reducing toil across the organisation I look after has been our slow and methodical exploration of allowing small quorums of people make changes to disparate areas of our platform autonomously.

We’ve built our own harness – a coordinating agent - that produces changesets and opens and tracks pull requests once it’s completed work. We’re piloting it, but with good first signs. I think this is possible the future of a lot of toil-based engineering work that frequently bogs down teams or dies in coordination and planning work.

I think it’s objectively much more interesting to use agent assisted tooling to reduce the complexity of code, and remove toil, than just accelerating the creation of new software.

Observing Super-Agent

The market is less clear on where things are going for Observing Super-Agents. It’s clearly GitHub’s strategy – the confluence of tools like CodeSpaces and Copilot as assignee is heading in the direction of auto-dispatched and auto-reviewed tickets.

I suspect this is the place where a pre-defined product will most likely emerge. You can make a bet on something like DeerFlow, or one of the OpenClaw (and clones) family of observing agents but the space feels like it’s still building out it’s primitives for the safe execution of agents inside trusted contexts.

If you look on the internet lots of people will talk about how they’re “doing this now” but I think it’s probably only very risk-comfortable organisations that are betting real world, autonomous features on agents – and there’s more credible signs that the early adopters are pulling back a little because they moved a little too fast.

It’s my suspicion that whatever the most successful products that emerge here will, at least in the interim, be a built on or around existing GitHub / Jira style workflows or built around a tool competing in that space line Linear.

This does appear to be the safest path to allow non-technical users to contribute to software at scale and at pace, without relying on giving “just anyone” access to programming agents and hoping they get it right.

The Cost Apocalypse

It’s safe to say that all of this is set against the backdrop of technical innovation and extreme price rises of frontier LLMs at the point of consumption.

The picture is particularly bad at the moment, with all the key vendors raising prices (5x-10x) as they wrestle with their huge initial investments. Simultaneously, of the work debuted in Windows at Build like MXC- https://github.com/microsoft/mxc (Microsoft eXecution Containers) is all about allowing agents to work in a controlled context on user machines.

I think this points to an emerging trend of a lot of AI development swinging back towards local hardware. Nvidia pushing it’s DGX and RTX Spark reference architectures for Windows machines, and Apple Silicon both simultaneously point to a future where a lot of the developer infrastructure for this tooling is done on device, at a non-trivial startup cost.

With Chinese models becoming increasingly capable, I’m relatively convinced that normal usage is going to be facing a different shape of supply chain problem within a few months, but actually it’ll go a long way to reducing to commodity levels, the cost of hardware for relatively capable local models and that’s why we’re seeing a lot of work being put into safe local execution contexts.

If you’re making bets here? Wait and see what is available by Q3-4 2026.

What we need to change to enhance this adoption

There are a few complimentary behaviours that need to change to help this shift to AI accelerated development in a lot of organisations.

Permissive Contribution Models

The ownership models of our software components in many software teams are protective.

Teams are asked to “own” their components, and that incentivises being closed to changes from outside of the team. As we anticipate a world where wide-ranging changes are coordinated by agents, and contributions are driven by people who may not have traditionally been engineers, a more collaborative, open source inspired, “custodianship” model will need to be adopted.

Organisations will often talk about this as an “inner sourced” model, inspired by open-source software. This shift in landscape makes it a required change – likely alongside SLAs for responding to pull requests, and better verification processes.

Increase In Defined Architectural Styles

We have long accepted that software implementations tend to differ across systems because it’s been traditionally very difficult to write a linter for “does this design fit” without resorting to self-certification and manual review.

With AI introspection, it’s now much easier to ensure that any pocket of software matches agreed upon default styles and patterns. Whilst we don’t expect all software to become uniform (and in fact, that should be an explicit non-goal – innovation comes from a lack of uniformity), we should expect that the average piece of software, sticks to our average house patterns and styles.

We need to step up documentation on our expected default and rely on introspection to provide a rating for each area of the system that shows how far away from standard any area is. Not being “standard” doesn’t mean “bad”, but it’s a leading indicator for us to understand where complexity exists in our estate.

With more work being contributed by agents, these written standards now have a second purpose – for guiding the implementations that the agents produce. It’s expected that if we embrace contributions from traditionally non-technical members of staff, the agents should follow the rules outlined in our standards to ensure a reasonable, sane implementation.

These patterns and standards will all be owned by experts in the discipline being standardised, and changes will go through those authoritative single points of control.

The Role of Teams in the Future

If the tools are now good enough that smaller teams can reason about systems-as-a-whole, and the teams themselves are more effective without organizational communication burden, a natural shape starts to emerge.

It’s also critically important to make we have a through line of knowledge – it’s our responsibility to raise the programmers of the future and make sure they’re equipped with the skills to do the work and do it well.

I suspect we’ll see a movement from the “two pizza teams” we have today (of about 6-10 people), to smaller quorums of people – a staff engineer shaped person, and a couple of understudies in a “master and apprentice” shaped model. To be successful, this new shape of team will probably be working in a pattern that looks more like mob programming, where they use tool assistance to change entire systems and platforms at once. The natural consequence of this is that site reliability engineering will enter a second era of prominence, as operating software effectively will become more important than ever.

This is an answer to the question “how do we train our juniors” in a world where programming knowledge is at risk of being lost, and companies will neglect this due to short term thinking, to their detriment.

Think about where you’re going

It’s a fool’s errand to try focus on technology in search of a problem, and this paper looks at the work we do purely through a technology lens. To understand which bets you need to make for your business requires an entirely different perspective, with different inputs.

General advice that doesn’t fit where you are, or what you’re doing, is useless to you.

Never forget the people

Regardless of the hubris and marketing around modern AI, it acts as an amplifier for whatever you’re doing at the moment. If you have good practices, it’ll amplify them. If you have bad practices, it’ll make everything a lot worse, a lot faster. Most of the best software I ever saw built was built with careful consideration, quiet reflection, and a lot of care. The best software is built by people who care about the software they’re building, and the people they’re building it for.

Going fast is useless if you’re moving in the wrong direction. Going fast is useless, if you cease to be able to navigate.

To build software for everyone, it has to be built by everyone.