Archive | January, 2026

AI Agents: When Software Starts Running the Work – Why the real shift in AI is not intelligence – but execution

What is an AI agent, really?
The term is everywhere. Slides, demos, product pages. And depending on who you ask, an agent is anything from a chatbot with tools to something that sounds suspiciously like the Terminator.

POLL

By 2030, how much of today’s work will be executed by AI agents?

10%: McKinsey today (fired ~10%)*

25%: ops & support automated

33%: org charts need a redesign

50%: society needs a redesign

80 VOTES ·

* ‘McKinsey challenges graduates to use AI chatbot in recruitment overhaul’ – FT, Jan 14, 20261

Of course, Prof. Andy came in with the science.
Definitions. Models. Maturity curves. Exciting, very.

So we ended up writing this piece – not to add another opinion, but to clarify what an agent actually is, and what it is not. Stripped of hype. Grounded in how real systems behave.

Although is not as exciting as the title image might suggest.
No killer robots (are we sure?). No secret agents.
Mostly hard bread.

But if you care about how AI will actually change companies – not in demos, but in production – this distinction matters.

The rest of this post does exactly that. It starts by explaining why agents matter now, then pins down a precise definition, and finally shows why the real shift is not happening at the model layer, but one level up – where software starts to execute work.

Not glamorous.
But real.

1. Why AI Agents Matter Now

For a long time, progress in gen AI followed a familiar pattern.

Models got bigger.
Benchmarks improved.
Demos became smoother.

But if you look closely at how work is actually done inside companies, very little changed.

People still open tickets, copy data between systems, escalate edge cases by email, and coordinate across tools that were never designed to work together. The intelligence was there, but it lived between applications rather than inside them. Execution remained a human responsibility. (also compare our blog ‘Has MIT Opened Pandora’s Box on AI Being a Bubble?’)

That gap is where AI agents enter – and the topic likely to dominate 2026.

Not as a new interface.
Not as a smarter chatbot.
But as a different execution model for software.

Agents matter now because large language models have crossed a practical threshold. They are not perfect, but they are reliable enough to reason across multiple steps, evaluate intermediate results, and decide what to do next. What they still lack is structure, control, and integration into real systems. Agents provide that missing layer.

Pic. Building an agent is a process of designing workflows and connecting pieces. Here an example of OpenAI (source)

OpenAI frames this shift explicitly by defining agents as systems that “independently accomplish tasks on your behalf,” rather than systems that merely generate responses.2 Microsoft’s enterprise documentation makes a similar point, describing agents as long-running, goal-driven processes that operate across tools and workflows instead of inside a single UI.3

This is not an AGI story.
It is an application-layer story.
And it is about what happens once models are “good enough,” but software still doesn’t actually do the work.

2. What Do We Mean by “AI Agent”

The word agent is currently overused to the point of losing precision.

Chatbots are called agents.
Copilots are called agents.
Scripted workflows are called agents.
Even classic automation tools rebrand themselves as “agentic.”

So we need a definition that is not marketing-based, but rooted in what competent people have used for decades: an agent is defined by behavior over time – not by UI or marketing.

Pic. *Increasing levels of autonomy and control: from single LLM calls to full agents. (Source)*

A canonical baseline comes from Russell & Norvig, who define an agent as something that perceives its environment and acts upon it to achieve goals. That framing predates GenAI, and it’s still the cleanest starting point.4 – I know, there are many other definitions.

A working definition (used throughout this series)

For this series, an “AI agent” is a software system that:

pursues an explicit goal, not just a prompt
plans across multiple steps, not a single response
maintains state between steps
acts through tools (APIs, code, systems)
evaluates progress and decides whether to continue, adapt, stop, or escalate

This maps well to modern surveys of LLM-based agents that formalize agents as systems composed of objectives, memory, perception, action, and feedback loops rather than simple input–output functions.5

Pic. Wang et al. — *LLM-based Autonomous Agents: A Surve* (source)

Diagram (agent loop, minimal but precise):

Goal → Plan → Act (Tools) → Observe → Update State/Memory → (repeat until done)

What an agent is not (this matters)

A single LLM call is not an agent.
A chat UI waiting for user input is not an agent.
A static DAG (Directed Acyclic Graph: a fixed, non-looping workflow) is not an agent.6
Pure RAG (Retrieval-Augmented Generation: retrieve → answer → stop) is not an agent.7

Those systems can be useful. But they don’t meet the “behavior over time” definition above. They don’t run work – they answer.

This is also how leading vendor docs frame it in practice. See above.8 9

Why the term “agent” became confusing

The confusion is that the market uses “agent” for three different things:

Assistant (helpful interface)
Workflow (predefined steps, deterministic)
Agent (goal-driven, stateful, tool-using loop)

If you don’t separate these, you can’t reason about reliability, cost, governance, or failure modes. Everything sounds “agentic” until it breaks.

Pic. *Growth of LLM-based autonomous agent research and agent categories (2021–2028). (Source)*

Diagram (fast comparison):

Assistant: user ↔ chat (help)
Workflow: input → steps → output (deterministic)
Agent: goal → loop → tools → state → completion/escalation (adaptive)

If you want a concrete visual of the “agent as a loop/graph” concept (vs a DAG), LangGraph’s docs and diagrams are among the clearest.10

3. From Apps to Agents: The Real Shift

Traditional software – including modern SaaS – is built on a simple operating assumption:

Humans decide. Software executes.

Users click, select, confirm, and coordinate. Software responds to inputs, but responsibility for outcomes remains with people. Even highly sophisticated systems still rely on humans to interpret context, decide what comes next, and handle exceptions. The intelligence that makes work function lives between applications, embedded in human workflows rather than encoded in software.

The Complete Guide to Human-in-the-Loop Automation — Pic. Human-orchestrated app stack (source)

This design pattern has been remarkably stable for decades, and it is explicitly reflected in how enterprise software has historically been designed: applications optimize for features and interfaces, not for owning execution end-to-end.

Where apps stop

Applications are optimized around:

features
interfaces
predefined flows

Breaking Down Silos in Product Development – Taipuva — (source)

They are good at executing known steps once a human has decided what to do. They are not designed to own the process end-to-end.

As a result, most organizations still depend on people to:

stitch systems together
manage handoffs
resolve edge cases
take responsibility when things break

Software assists. Humans orchestrate.

This division of labor is exactly what large parts of classic automation and RPA (Robotic Process Automation)11 were built around: deterministic steps, predefined paths, and explicit human control when something unexpected occurs.

What agents change

AI agents invert this relationship.

Instead of waiting for instructions, an agent receives a task. It determines which steps are required, executes actions across tools and systems, evaluates intermediate results, and continues until the task is completed or explicitly escalated.

The unit of interaction shifts:

from screens and features
to tasks and outcomes

This is the core structural change. Responsibility moves from people to software — not for judgment yet, but for execution.

Beginner's guide to multi-agent orchestration with watsonx ... — (source)

This framing is consistent across enterprise documentation. Microsoft explicitly describes agentic systems as orchestration layers that coordinate actions across multiple systems and workflows rather than living inside a single application UI. In this model, agents run long-lived processes that manage state, branching, and tool invocation over time.12

IBM uses similar language when describing enterprise agents, emphasizing systems that evaluate signals, act across domains, and adapt through feedback rather than following fixed scripts. The focus is not conversational interaction, but execution across heterogeneous systems.13

Across vendors, the common thread is not autonomy. It is execution across boundaries.

OpenAI’s own documentation reinforces the same distinction by defining agents as systems that “independently accomplish tasks on your behalf,” explicitly separating them from chat-based assistants that respond turn by turn.14

Why this is not “better automation”

It is tempting to view agents as an incremental upgrade to automation or RPA. That interpretation misses the point.

Traditional automation assumes:

stable inputs
predictable paths
low exception rates

It works best when the process can be fully specified in advance.

Agents are designed for the opposite conditions:

ambiguous inputs
changing context
exception-heavy workflows

They do not eliminate complexity. They absorb it.

This distinction aligns with broader research on automation and AI in organizations, which shows that the limiting factor is rarely intelligence itself, but the ability to handle variability and exceptions in real operational environments.15

What actually shifts inside organizations

When execution moves from apps to agents, the bottleneck changes.

Instead of asking:

Which feature should we build?

organizations start asking:

Which tasks can we safely delegate to software? (compare McKinsey above)

This reframing – from feature delivery to task ownership – is exactly what analysts describe as the transition from task automation to autonomous or semi-autonomous workflows.16

This is the inflection point. Not because agents are “smarter,” but because software begins to take responsibility for running work rather than supporting it.

That is the real shift this series is about.

Why this is not “better automation”

It is tempting to view agents as an incremental upgrade to automation or RPA. That interpretation misses the point.

Traditional automation assumes:

stable inputs
predictable paths
low exception rates

Automations vs AI Workflows vs AI Agents: Understanding the ... — Pic. Click the here (source) for better readability

It works best when the process can be fully specified in advance.

Agents are designed for the opposite conditions:

ambiguous inputs
changing context
exception-heavy workflows

They do not eliminate complexity. They absorb it.

This is why agents only become viable once reasoning models reach a sufficient level of robustness. Without the ability to reason across steps and adapt to new information, software cannot own execution without constant human intervention.

What actually shifts inside organizations

When execution moves from apps to agents, the bottleneck changes.

Instead of asking:

Which feature should we build?

organizations start asking:

Which tasks can we safely delegate to software?

This is the inflection point. Not because agents are “smarter,” but because software begins to take responsibility for running work rather than supporting it.

That is the real shift this series is about.

What changes for leadership

As execution moves from applications to agents, the organizational impact goes one level deeper.

Leadership is no longer managing an organigram made up exclusively of humans. Increasingly, teams consist of humans and agents operating side by side, with agents owning defined tasks, workflows, or execution domains.

This is not speculative. In recent discussions ahead of Davos 2026, McKinsey highlighted that one of the emerging leadership challenges is managing organizations where accountability, coordination, and oversight span both human roles and autonomous or semi-autonomous agents. Traditional org charts assume people in every box. That assumption no longer holds.17

The Future of Marketing Teams: Human-AI Collaboration Interesting ... — (source)

In practice, this introduces new questions:

Who owns outcomes when an agent executes a task?
How are responsibilities split between human managers and agent systems?
Where do escalation paths sit when execution is automated but judgment remains human?

These are organizational design questions, not technology questions. They follow directly from the shift described in this chapter: once software begins to run work, leadership must treat agents as first-class actors in the operating model, not just tools.

This implication will matter more than the technology itself. See also our Blog ‘🦾 AI and Robots – Azeotrope or Entities’.

4. Anatomy of an AI Agent

This is the densest section, and intentionally so. Understanding agents requires looking inside them – not at frameworks or vendors, but at the functional components that repeatedly appear across research and production systems.

Pic. General workflow of AI agent which consists of three components: perception, brain and action(source)

Across the literature, agent architectures converge on a small set of roles. The differences are mostly in implementation, not in structure.

The cognitive core

At the center of an agent sits a cognitive core, typically a large language model or a combination of models. Its role is reasoning and decision-making, not execution.

Early agent systems failed for a predictable reason: they asked the model to do everything – plan, execute, validate, and remember. In practice, this leads to high cost, high latency, and brittle behavior. Empirical work on LLM-based agents shows that conflating reasoning and execution amplifies error propagation and makes systems hard to control.18

Modern agent architectures therefore treat the model as a decision component, not as a controller of the entire system.

Planning vs execution

One of the most widely adopted architectural patterns is the planner–executor split.

The planner determines what should be done.
The executor focuses on how to do it.

This separation reflects a basic operational reality: plans change slowly, while actions happen frequently. By isolating planning from execution, systems reduce reasoning overhead, allow cheaper or specialized models to handle execution, and make failures easier to isolate.

This pattern is formalized in research such as ReAct, which interleaves reasoning and acting while still separating deliberation from tool use, and is now common in production agent frameworks.19

Memory

Memory is another core component – and one of the most common sources of failure.

Agent systems typically rely on:

working memory, which tracks the current task state and recent actions
long-term memory, which stores historical context, prior cases, or learned preferences

Retrieval-augmented generation is often used as a memory mechanism, but recent research shows that naïve accumulation of context leads to stale state, uncontrolled growth, and compounding errors in long-running agents.20

Long-term Memory for AI Agents. Why Vector Databases are not ... — (source)

Effective agent design treats memory as a managed resource, not as an ever-growing transcript.

Tools and execution

Tools are where agents become operational.

APIs, databases, code execution environments, and internal systems allow agents to act on the world rather than merely describe it. Across real deployments, tool misuse is the dominant failure mode – not model hallucination.

This is why production-grade agent systems emphasize:

explicit schemas
input and output validation
permissioning and access control

These controls matter more for reliability than prompt sophistication. An agent without tools is a reasoning system. An agent with poorly constrained tools is a risk surface.

Feedback and evaluation

Finally, agents require feedback.

Reflection loops, critics, and evaluation steps are increasingly standard in agent architectures because systems without feedback drift over time. Explicit evaluation allows agents to detect failure, trigger retries or escalation, and support auditability.

Research on agent reliability consistently shows that feedback mechanisms are a prerequisite for control in probabilistic systems.21

Without feedback, agents either act too cautiously to be useful or too confidently to be trusted.

Why this matters

These components – reasoning, planning, memory, tools, and feedback – are not optional. They define whether an agent can operate beyond toy demos.

They also explain why agent systems feel complex: they are distributed control systems, not chatbots.

This anatomy is the technical foundation for everything that follows.

5. Structure ≠ Decision Capability

At this point, the structure of AI agents is clear.

They plan.
They maintain state.
They call tools.
They execute workflows.

That already marks a real architectural shift.

What it does not imply is decision maturity.

A well-established constraint (not an opinion)

Across analytics and software engineering, decision systems evolve in distinct maturity stages:

Descriptive: what happened
Diagnostic: why it happened
Predictive: what is likely to happen
Prescriptive: what should be done

Systematic reviews of data & analytics maturity models show that most organizations operate at the first two levels. Reliable prescriptive decision-making is rare and requires stable data, validated models, governance, and ownership — not just advanced tooling.22

This is a maturity problem, not a technology problem.

The same pattern exists in software engineering

The Capability Maturity Model (CMM/CMMI) shows the same dynamic:
systems only become predictable and reliable once processes are defined, measured, and controlled. Early-stage systems may look impressive, but behave inconsistently and depend on ad-hoc intervention. Reliability emerges from discipline, not sophistication.23

What this means for LLM-based agents

Large language models are generative systems.
They produce plausible reasoning based on language patterns.

This makes them effective at:

summarizing
explaining
pattern-matching
limited, constrained diagnosis

By default, they do not deliver validated, repeatable prescriptive decisions.

This assessment aligns with big data and analytics maturity research, which shows that prescriptive systems require validated models, stable pipelines, and governance before decision authority can be delegated to software.24

The practical boundary (this matters)

Across real deployments, a consistent pattern appears:

Many agents are structurally capable but operationally immature.

Prescriptive agent behavior becomes defensible only when one of two conditions is met:

decisions are backed by validated ML or optimization models, or
the agent is tightly constrained by workflows, guardrails, and human approval points

Without this, agents may execute – but they execute without decision authority.

Why this matters for the rest of the series

AI agents are execution mechanisms.

Their reliability is determined by:

data maturity
process discipline
governance

not by how convincing their reasoning sounds.

Closing of Part 1

So far, we have deliberately avoided discussions about enterprise rollout, economics, labor impact, or governance.

Part 1 is about establishing a shared mental model: what an AI agent actually is, how it differs from apps and workflows, and which components truly matter. Without this foundation, debates about ROI or disruption quickly dissolve into hype or confusion.

With it, we can now turn to the harder questions: why agents are difficult to run in production, where they actually work today, and where their limits still are. Planned for next week.

Sources

https://www.ft.com/content/de7855f0-f586-4708-a8ed-f0458eb25586

(OpenAI, A Practical Guide to Building Agents)

(Microsoft Learn, Create Autonomous Agent Workflows)

Russell & Norvig — Artificial Intelligence: A Modern Approach

https://arxiv.org/abs/2308.11432; https://arxiv.org/pdf/2308.11432

https://aima.cs.berkeley.edu/

https://arxiv.org/abs/2005.11401

Source: OpenAI — Agents guide

Source: Microsoft Learn — Create autonomous agent workflows

LangGraph overview

Robotic Process Automation: https://www.techtarget.com/searchcio/definition/RPA#:~:text=Robotic%20process%20automation%20(RPA)%20is%20a%20technology,with%20software%20to%20perform%20high%2Dvolume%2C%20repeatable%20tasks.

Source: https://learn.microsoft.com/en-us/azure/logic-apps/agentic-workflows-overview

Source: https://www.ibm.com/think/tutorials/human-in-the-loop-ai-agent-langraph-watsonx-ai

Source: https://platform.openai.com/docs/guides/agents

https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/automation-ai-and-the-future-of-work

Source: https://www.gartner.com/en/articles/from-task-automation-to-autonomous-workflows

https://www.mckinsey.com/featured-insights/world-economic-forum/overview

Source: https://arxiv.org/abs/2308.11432

Source: https://arxiv.org/abs/2210.03629
Source: https://blog.langchain.com/planning-agents/

Source: https://arxiv.org/abs/2404.04834

Source: https://arxiv.org/abs/2303.11366

Publisher page (Schmalenbach Journal of Business Research, Springer):
https://link.springer.com/article/10.1007/s41471-024-00205-2

DOI:
https://doi.org/10.1007/s41471-024-00205-2

Open-access PDF (via Springer):
https://link.springer.com/content/pdf/10.1007/s41471-024-00205-2.pdf

Canonical source (IEEE Software):
https://ieeexplore.ieee.org/document/219617

DOI:
https://doi.org/10.1109/52.219617

IEEE Xplore (conference paper):
https://ieeexplore.ieee.org/document/8970866

DOI:
https://doi.org/10.1109/JEEIT.2019.8970866

Source: https://studioalpha.substack.com/p/ai-agents-when-software-starts-running by Fabian Hediger – 15 Januari 2026

Tags: agentic systems, AI agent, ambiguous inputs, APIs, application-layer, Capability Maturity Model (CMM/CMMI), chatbots, code, cognitive core, Copilots, DAG (Directed Acyclic Graph: a fixed non-looping workflow), distributed control systems, exception-heavy workflows, full agents, GenAI, generative systems, LLM calls, LLM-based agents, long-term memory, OpenAI frames, operationally immature, orchestration layers, planner–executor split, prescriptive decisions, RAG (Retrieval-Augmented Generation: retrieve → answer → stop), RPA (Robotic Process Automation), Scripted workflows, systems, tool misuse, working memory

Comments Leave a Comment
Categories Introduction YTD2525

January 2026
M	T	W	T	F	S	S
	1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Featured Posts

Quantum security is turning into a supply chain problem

Supplier onboarding, invoice processing, and procurement platforms run on encrypted data flows that were built for long-term trust. In many organizations, that trust still depends on cryptographic standards like RSA and elliptic curve cryptography (ECC), even as security teams begin planning for a post-quantum world. A recent apexanalytix research report argues that supply chain leaders […]
AI Agents: When Software Starts Running the Work – Why the real shift in AI is not intelligence – but execution

What is an AI agent, really? The term is everywhere. Slides, demos, product pages. And depending on who you ask, an agent is anything from a chatbot with tools to something that sounds suspiciously like the Terminator. POLL By 2030, how much of today’s work will be executed by AI agents? 10%: McKinsey today (fired […]
Private LTE and 5G Network Ecosystem – CAGR 22% from 2025-2030 and AI’s roles in IMT‑2030

SNS Telecom & IT’s latest research report, “Private LTE & 5G Network Ecosystem: 2025 – 2030” indicates that the private LTE and 5G network market is estimated to be worth $7.2 billion by the end of 2028 and continues to grow as private 5G deployments overtake LTE across many vertical industries. This steady, strong growth stands […]
The Next Frontier for 6G Networks

Race to 6G networks is already established, and one of the technologies that is continuously proving itself as a key pillar is Terahertz (THz) communication. Between microwaves and infrared light, THz frequencies open up a very broad bandgap, allowing performance of data rates that were only possible with optical fiber before. The THz spectrum will […]
The infrastructure moment

As the definition of infrastructure expands, investors, operators, and governments will need to alter their approaches to funding, construction, and maintenance. Infrastructure is a critical enabler of long-term global economic growth, supporting prosperous societies, elevated standards of living, and every modern industry. But the ongoing expansion and evolution of what infrastructure comprises has transformed its definition, demanding […]

Search

ytd2525

AI Agents: When Software Starts Running the Work – Why the real shift in AI is not intelligence – but execution

By 2030, how much of today’s work will be executed by AI agents?

1. Why AI Agents Matter Now

2. What Do We Mean by “AI Agent”

A working definition (used throughout this series)

What an agent is not (this matters)

Why the term “agent” became confusing

3. From Apps to Agents: The Real Shift

Where apps stop

What agents change

Why this is not “better automation”

What actually shifts inside organizations

Why this is not “better automation”

What actually shifts inside organizations

What changes for leadership

4. Anatomy of an AI Agent

The cognitive core

Planning vs execution

Memory

Tools and execution

Feedback and evaluation

Why this matters

5. Structure ≠ Decision Capability

A well-established constraint (not an opinion)

The same pattern exists in software engineering

What this means for LLM-based agents

The practical boundary (this matters)

Why this matters for the rest of the series

Closing of Part 1

Share this:

Topics

Yes, I want to follow ytd2525

Search

Calendar

Featured Posts

Quantum security is turning into a supply chain problem

AI Agents: When Software Starts Running the Work – Why the real shift in AI is not intelligence – but execution

Private LTE and 5G Network Ecosystem – CAGR 22% from 2025-2030 and AI’s roles in IMT‑2030

The Next Frontier for 6G Networks

The infrastructure moment

Recent Posts

Top Posts & Pages

Catagories

ytd2525

Links

Top Clicks