Agentic AI Systems under the EU AI Act — A Risk-Based Regime Meets Its First Stress Test

Public debate about the economic and societal consequences of artificial intelligence, reignited by the market entry of ChatGPT and other generative chatbots, has not yet subsided. Yet the next step in the evolution of AI systems is already on the threshold: so-called agentic AI systems (or "AI agents"). These are AI systems capable of executing complex tasks autonomously on a single prompt from the user. From a legal perspective, agentic AI raises the question whether the European Union's flagship regulation — the AI Act — is equipped to capture the altered risk profile of systems that no longer merely generate text or images, but act.

Key Takeaways

The AI Act's risk-based approach classifies systems by their intended purpose, yet an agentic system's true risk is set by the tools it can invoke and its degree of autonomy — variables the framework reaches only indirectly, through Article 14 (high-risk systems only) and the systemic-risk regime for GPAI models.
Most agentic systems will run on a general-purpose AI model (Article 3(63)) and, where they can serve a variety of purposes, will themselves be general-purpose AI systems (Article 3(66)); the most capable models are likely to be designated GPAI models with systemic risk under Article 51.
Article 25(1)(c) is a live and frequently overlooked exposure: a deployer that redirects a GPAI or agentic system to a high-risk purpose can be re-characterised as its provider, inheriting the full provider obligation set.
Architecture is compliance — foundation-model choice, fine-tuning scope, tool scoping, and autonomy level determine a system's classification and must be documented contemporaneously; as the analysis puts it, tool scoping is risk scoping.

§ IIntroduction

Asked to plan a holiday, an agentic system will draft an itinerary based on the user's preferences and, without further human intervention, book the flights, hotels, and rental cars. In a business context, the developers of such systems envisage them performing entire workflows as "virtual" colleagues or employees — from opening tickets and conducting research to executing purchasing decisions. For many, agentic AI represents the moment at which the long-promised productivity gains of artificial intelligence finally materialise.

This article sets out the technical architecture of agentic AI systems, identifies the specific legal risks they introduce, and examines how they fit into the two regulatory pillars of the AI Act: the rules on general-purpose AI ("GPAI") models and systems on the one hand, and the risk-based approach centred on the intended purpose of AI systems on the other. It closes with an outlook on related questions of civil law, liability, and legal personality.

The central thesis is that agentic AI is the first substantive stress test for the AI Act. The risk-based approach, anchored in the intended purpose of a system, is conceptually ill-suited to capture the risks that flow from an AI system's ability to select and wield tools autonomously.

§ IITechnical Background: What Makes an AI System "Agentic"

At a technical level, agentic AI systems are an evolution of Large Language Models (LLMs). Within an agentic system, an LLM translates the user's goal into a sequence of actions and iteratively executes, evaluates, and adjusts those actions until the goal is reached. What sets the architecture apart from a conventional chatbot is the ability of the LLM to access and orchestrate external tools.

The LLM functions as the "brain" of the agentic system: it plans the necessary steps and selects the tools appropriate for each step. The tools available depend on how the system has been built. A controlling LLM may call on other more specialised LLMs for specific questions, query a retrieval-augmented generation ("RAG") system, or invoke application programming interfaces — for instance, to book a flight through an airline's reservation API. It may also draw on a so-called Large Action Model (LAM). Whereas an LLM takes natural-language input and, on the basis of probabilities, produces textual or visual output, a LAM is trained on sequences of user actions within a graphical user interface. Given a user-defined goal, a LAM can operate the interface itself, thereby replicating the steps a human would perform. Combined with a LAM, an agentic system can control a computer or web browser, code a website inside an integrated development environment, or place an order on a food-delivery site entirely without human intervention.

An agentic AI system is therefore best understood as a composite: a controlling unit (the "brain") that translates user input into concrete action steps, together with a suite of tools it can invoke. Those tools may themselves be AI systems or simply deterministic interfaces such as APIs. What distinguishes an agentic system is the integration of these components into a single functional whole whose output is not text, but action.

§ IIIThe Altered Risk Profile

The AI-specific risks commonly associated with LLMs are well-rehearsed: hallucinations, bias and discriminatory output, copyright exposure around training data and generated content, unjustified processing of personal data, users over-estimating the capabilities of the system, and — as documented in recent safety research on frontier models — attempts by models to circumvent the safety boundaries imposed upon them.

Because agentic systems are built on top of LLMs — with an LLM operating as the controlling brain — all of these risks are inherited. But they are also amplified, for two reasons.

First, the output of an LLM is text; the output of an agentic system is conduct in the world. Where a hallucination in a chat interface produces a false statement that the user may verify, a hallucination in an agentic system may produce a booked flight, a transferred sum, a committed order, or a misconfigured device. The risk is no longer confined to the interaction between the user and the screen; it propagates along the chain of actions the system performs on behalf of the user.

Second, the magnitude of that risk is a function of two design variables, each of which is chosen when the system is built:

The set of tools the system can invoke. A narrowly scoped agent — restricted, say, to inputting predefined fields in a single booking interface — has a correspondingly narrow risk surface. An agent with unrestricted control over a computer, or with access to the control systems of an industrial plant, has a materially larger one. Tool selection therefore determines capability, and capability determines risk.
The degree of autonomy with which the system is permitted to act. Three levels are commonly distinguished:

Level 1

Human-in-the-loop

Every action by the system requires prior human authorisation.

Level 2

Human-on-the-loop

The system acts independently, but under supervision; the human can intervene before an action is executed.

Level 3

Human-out-of-the-loop

The system acts without prior human authorisation or real-time oversight.

Agentic systems can in principle be built at any of these levels. Yet the productivity gains that agents are supposed to deliver scale with autonomy: the less human input required, the greater the efficiency benefit. There is therefore a commercial incentive to push towards the human-out-of-the-loop end of the spectrum — and, with it, an increased probability that erroneous or harmful actions go unchecked.

Christian Förster has rightly identified autonomy as one of the two key distinguishing features of AI systems in the sense of the AI Act (alongside their machine-based character).1 His observation that "the inherent risk of the system increases in proportion to its autonomy" is nowhere more acute than in the context of agentic AI, where autonomy is the central design parameter. The AI Act itself acknowledges this: Recital 12 explicitly calls for "a certain degree of independence from human action" as a defining feature of an AI system, and Article 14(2) tethers the intensity of human oversight obligations for high-risk systems to the "level of autonomy" of the system.

§ IVClassification under the AI Act

The EU legislator designed the AI Act as a technology-neutral regulation: Article 3(1) defines "AI system" broadly enough to cover most current and foreseeable forms of AI, and — subject to a significant exception — the intensity of regulation does not depend on the underlying technology but on the risk associated with the intended use of the system. The exception is the dedicated regime for general-purpose AI models (and, with a far lighter touch, general-purpose AI systems) introduced late in the legislative process when LLMs entered the public consciousness. Obligations may therefore arise along two parallel tracks: from the risk-based approach to AI systems, and from the classification of a model or system as general-purpose.2

1. Agentic AI as a General-Purpose AI Model or System

The first question is whether the AI model that powers the "brain" of the agentic system qualifies as a general-purpose AI model within the meaning of Article 3(63) AI Act. That provision defines a general-purpose AI model as:

Article 3(63) AI Act — GPAI Model (Definition)

"an AI model, including where such an AI model is trained with a large amount of data using self-supervision at scale, that displays significant generality and is capable of competently performing a wide range of distinct tasks regardless of the way the model is placed on the market and that can be integrated into a variety of downstream systems or applications, except AI models that are used for research, development or prototyping activities before they are placed on the market."

For GPAI models — and for AI systems that integrate such models — the AI Act imposes extensive horizontal obligations, including transparency duties, preparation of technical documentation for downstream providers, and (as discussed below) additional obligations where the model presents a systemic risk.

(a) The Underlying Model

Whether an agentic system is built on a GPAI model depends on its concrete architecture. In principle, both an off-the-shelf LLM and a model that has been fine-tuned for a specific purpose can serve as the controlling brain. Where the system is built on an unmodified LLM with additional agent-specific capabilities bolted on, the model will almost invariably qualify as a general-purpose AI model. Fine-tuning the model for a narrower purpose — say, a specific business domain or a defined agent task — does not automatically strip it of that status. The question under Article 3(63) is whether the remaining capabilities of the model still display "significant generality" and can still competently perform "a wide range of distinct tasks".

Because an agentic system must, by design, understand a variety of user prompts, identify an appropriate goal, plan a sequence of steps, and execute them across heterogeneous interfaces, the underlying model will in most cases retain the capabilities that trigger GPAI classification — even after purpose-specific fine-tuning. The exception is the highly specialised agent that is narrowly trained to perform a single task composed of a fixed sequence of sub-steps; such a system may fall outside the GPAI definition even though it operates autonomously.

(b) The Agentic System as a "General-Purpose AI System"

A separate question is whether the agentic system itself qualifies as a general-purpose AI system within the meaning of Article 3(66) AI Act. That provision defines a GPAI system as an AI system "based on a general-purpose AI model and which has the capability to serve a variety of purposes, both for direct use as well as for integration in other AI systems". Classification as a GPAI system triggers further transparency and interoperability duties under Articles 50 and 53 AI Act.

The decisive question is what purposes the system is capable of serving. The AI Act defines "intended purpose" (Article 3(12)) but not "purpose" in the abstract. It is therefore unclear whether the range of purposes is to be determined purely from the perspective of the provider (and hence limited to the intended uses) or whether reasonably foreseeable misuses and circumventions of safety boundaries must also be taken into account. The risk-based architecture of the regulation, together with the deliberate contrast between "purpose" and "intended purpose" in the drafting, supports the broader reading: reasonably foreseeable misuse must be considered.

In the context of agentic AI, this matters for a practical reason. The range of purposes an agent can serve is determined not only by the provider's intended business case, but by the tools the agent can wield. An agent marketed only for travel bookings, but given general control over a user's computer or browser, can be repurposed (intentionally or inadvertently) for almost anything. Such an agent should be treated as a general-purpose AI system. The narrower characterisation — "this is only a travel agent" — becomes defensible only where the tools available to the agent are themselves tightly scoped, for example to predefined fields in a single booking API.

(c) Systemic Risk

If the underlying model qualifies as a GPAI model, it may in addition be classified as a GPAI model with systemic risk under Article 51 AI Act. Classification occurs either (i) where the model has "high-impact capabilities" — presumed where the cumulative amount of compute used for training exceeds 10²⁵ floating-point operations (Article 51(2)) — or (ii) by decision of the European Commission, ex officio or following a qualified alert from the scientific panel, having regard to the criteria in Annex XIII.3

10²⁵

FLOPs — compute threshold presuming systemic-risk GPAI under Art. 51(2)

Art. 14

Human-oversight intensity tracks the system's level of autonomy

Art. 25(1)(c)

Deployer who modifies intended purpose may be re-characterised as provider

The criteria in Annex XIII are particularly pertinent for agentic systems. Beyond parameter count, dataset size, and training compute, they include:

the input and output modalities of the model;
the benchmarks and evaluations of the model's capabilities;
the model's adaptability to learn new, distinct tasks;
the level of autonomy and scalability of the model; and
the tools to which the model has access.

For frontier agentic systems these criteria will frequently be met in combination: such systems are designed to exhibit a high degree of autonomy, are adaptable in their planning and execution, and — by definition — access a range of external tools. It is therefore reasonable to expect that the Commission will in due course designate the most capable agentic AI models as GPAI models with systemic risk.

Conversely, a provider may seek to avoid the heightened obligations attached to systemic-risk GPAI models (Article 55) by building the brain on a model aggressively narrowed to a single purpose. Whether such a pared-down model remains capable of delivering agent functionality, and whether the resulting model still crosses the Article 51(2) compute threshold, will be the practical test. This is an area where careful architectural choices — and careful documentation of those choices — can materially affect the regulatory perimeter.

(d) Interim Conclusion

Agentic AI systems will in most cases be built on a GPAI model and, where the system exhibits the capacity to serve a variety of purposes, will themselves qualify as GPAI systems. Fine-tuning the underlying model rarely changes this analysis. The most capable agentic systems are likely to be designated as GPAI models with systemic risk, triggering the full weight of the obligations under Articles 51 et seq. AI Act — with the consequence that the provider of the model (not necessarily the provider of the agentic system built on it) becomes the primary addressee of those obligations.

2. Agentic AI in the Risk-Based Approach

Alongside the GPAI regime, the risk-based approach forms the second pillar of the AI Act. It classifies AI systems according to their intended use into prohibited practices (Article 5), high-risk AI systems (Articles 6 and Annex III), AI systems subject to transparency obligations (Article 50), and low-risk or minimal-risk systems. Obligations escalate with the assessed risk class.

(a) Prohibited Practices and High-Risk Systems

The classification of an agentic AI system within the risk-based framework turns on its intended use and therefore allows only generalising observations. An agentic system deployed for a practice prohibited under Article 5 remains prohibited; the agentic nature of the system is irrelevant. Conversely, an agentic system deployed for a use case listed in Annex III (for example, in recruitment, credit scoring, or the administration of essential public or private services) qualifies as a high-risk AI system and attracts the full spectrum of provider and deployer obligations under Chapter III AI Act.

Of particular interest is the interface between the GPAI regime and the risk-based approach established by Article 25(1)(c) AI Act. That provision deems distributors, importers, deployers, or other third parties to be the provider of a high-risk AI system where they modify the intended purpose of an AI system — including a general-purpose AI system that has not been classified as high-risk and has already been placed on the market — such that the AI system becomes high-risk. The implication is significant: GPAI systems (and therefore, by extension, most agentic systems) do not automatically fall within the high-risk category. The economic actor who changes the system's intended purpose into a high-risk use may find themselves re-characterised as the provider, together with the full weight of provider obligations.

Where the agentic system is high-risk from the outset, the human oversight obligation in Article 14 AI Act becomes central. Article 14(1) requires providers to design and develop high-risk AI systems in such a way that they can be effectively overseen by natural persons during the period of their use. The oversight measures must be "commensurate with the risks, level of autonomy and context of use" (Article 14(2) and (3)). This is particularly pointed in the agentic context: the value of an agentic system is a direct function of its autonomy, yet Article 14(2) insists that the intensity of human oversight tracks precisely that same autonomy. In other words, the more capable (and therefore commercially attractive) the agentic high-risk system, the more demanding the oversight apparatus that must surround it — a feature the provider cannot design away.

Practically, Article 14(4) requires that the humans to whom oversight is assigned be enabled, among other things:

to properly understand the capacities and limitations of the system;
to remain aware of the tendency to over-rely on the system's output (automation bias);
to correctly interpret the system's output;
to decide not to use the system, disregard its output, or override it in any given situation; and
to intervene in the operation of the system or interrupt it through a "stop" button or a comparable procedure.

Christian Förster captures the underlying intuition aptly: a high-risk AI system must remain "responsive" such that a human is, in the final analysis, able "to pull the plug".4 For agentic systems, designing such an interruption mechanism is not a mere formality. When the system is mid-workflow — having already called several APIs, written to several stores, and taken several partially-irreversible actions — the question of what "stopping" means operationally becomes a product-design problem with legal consequences.

(b) Does the Risk-Based Approach Adequately Capture the Risk Profile of Agentic AI?

The answer is largely negative. By anchoring classification in the intended purpose of the system, the risk-based approach struggles to capture what is genuinely new about agentic AI: that risk is driven not only by the intended use case, but by the set of tools the system may invoke and the degree of autonomy it enjoys. Those two variables — central to the risk profile — are only indirectly addressed in the risk-based framework, chiefly through the human-oversight obligation of Article 14 (which applies only to high-risk systems) and through the systemic-risk regime of Articles 51 et seq. (which applies only to the underlying GPAI model).

A concrete illustration makes the point. Imagine an agentic system marketed as a "travel booking assistant". Its intended purpose is plainly not among the uses listed in Annex III; it is therefore, under the risk-based approach, a low-risk system subject only to the transparency duties under Article 50. If that system is nonetheless given general computer-use capability — the ability to drive a browser, open files, enter credentials — its actual risk surface extends far beyond travel bookings. A malfunction, prompt-injection attack, or simple error could result in damage well outside the booking context: unauthorised transactions, data exfiltration, or inadvertent disclosure of confidential files. The AI Act's risk-based classification does not grade that risk up.

For a classical AI system whose output is a recommendation or a decision, the risk-based approach is coherent. For an agentic system whose output is action across a variety of environments, the approach systematically under-counts risk.

§ VOperational Consequences for Providers and Deployers

Several practical consequences follow from the analysis above. They are relevant to providers and deployers of agentic AI systems, and to their legal advisers.

Architectural decisions are regulatory decisions

Whether a given agentic system sits inside or outside the high-risk category, whether the underlying model qualifies as a GPAI model (or a GPAI model with systemic risk), and whether the system as a whole qualifies as a GPAI system, are determined by design choices made long before the product reaches the market. Those choices — the selection of the foundation model, the scope of fine-tuning, the set of tools made available, the degree of autonomy permitted — should be documented contemporaneously and in a manner that will withstand regulatory scrutiny. It is rarely possible to reverse-engineer a compliance narrative after the fact.

Tool scoping is risk scoping

Restricting the agent's tools to what is strictly necessary for its intended purpose is the single most effective means of keeping its risk profile in line with its business purpose. Broad tool access — in particular, general computer-use capability — is a double-edged sword: it unlocks functionality, but it also pulls the system closer to GPAI classification and, potentially, to systemic-risk designation. The principle of minimum necessary tooling should guide system design.

Autonomy budgets and human-in-the-loop design are defensible choices

The commercial temptation is to reduce human intervention. The regulatory reality is that autonomy amplifies the obligations around oversight, documentation, and (for high-risk systems) conformity assessment. A deliberate decision to operate at a human-in-the-loop or human-on-the-loop level, at least for defined categories of action (financial transactions above a threshold, communications sent externally, changes to persistent data), is in many cases the prudent design choice and should be memorialised in the system's technical documentation.

Article 25 is an exposure that deployers frequently overlook

An enterprise that deploys a third-party agentic system and modifies its intended purpose — for instance by pointing it at a hiring workflow — can, under Article 25(1)(c), be re-characterised as the provider of a high-risk AI system. The original provider's documentation, conformity assessment, and registration do not transfer to the new purpose. In-house governance processes should therefore flag and assess any proposed redeployment of an agentic system before it goes live.

Contracts along the value chain must account for the GPAI regime

Providers of GPAI models owe transparency and documentation duties to downstream providers under Articles 53 and 55. Those duties translate into contractual undertakings that should be carefully negotiated: breadth of model-card disclosure, access rights to training-data summaries, cooperation in conformity assessments, and indemnification for regulatory non-compliance. Deployers building agentic systems on third-party foundation models should insist on contractual commitments commensurate with the information they need to meet their own obligations.

§ VIBeyond the AI Act: Civil Law and Legal Personality

Agentic AI will unsettle a number of legal questions that sit outside the AI Act. Three deserve brief mention.

Attribution of actions

Where an agentic system executes a contract on behalf of a user — books a flight, places an order, signs up for a service — the question arises under general private law whether the resulting declaration of intent is attributable to the user. Continental European doctrines of declaration and agency were built around natural persons (and, by analogy, their human agents). Their application to an autonomously acting software agent is unsettled, and the courts have only begun to grapple with it.

Liability for damage

The European legislator has already signalled that the existing framework is insufficient. The proposed AI Liability Directive and the revised Product Liability Directive (Directive (EU) 2024/2853) together aim to ease the claimant's evidentiary burden where harm is caused by an AI system, including by creating presumptions around the causal link between fault and output. Agentic systems, whose output is not a representation but an action, will test these regimes in new ways: proving that a specific action was "caused" by a fault in a specific component of a multi-tool agent will be evidentially demanding, and the allocation of liability between the model provider, the system provider, and the deployer will frequently be contested.

Legal personality

Every time an AI system becomes more autonomous, the debate about an "electronic person" is reopened. In 2024 Air Canada attempted to avoid liability for an incorrect statement made by its website chatbot, arguing that the chatbot was a separate legal entity distinct from Air Canada. The British Columbia Civil Resolution Tribunal rejected that argument, as it was bound to.5 But if providers are prepared to run that argument for a simple Q&A chatbot, the pressure to run it for an agentic system that autonomously plans and executes multi-step actions will only increase. The correct answer, in our view, remains that acts of an agentic system are acts of the legal person who deploys it; but the argument will continue to be made, and legislatures may eventually have to address it head-on.

§ VIIConclusion

Agentic AI is the first substantive technological development since the adoption of the AI Act, and it is already exposing the regulation's structural seams. The GPAI regime will capture most agentic systems — and the most capable ones will, in time, be designated as GPAI models with systemic risk — but its terminology leaves room for architectural choices that narrow the regulatory perimeter. The risk-based approach, built around the intended purpose of the AI system, is conceptually ill-suited to a class of systems whose risk profile is determined as much by the tools they can use and the autonomy they enjoy as by the business case they serve.

For providers and deployers, this means that regulatory design must begin in the architecture. Tool scoping, autonomy budgets, oversight interfaces, and documentation are not afterthoughts to be bolted on before launch; they are compliance levers that determine the classification of the system and, with it, the obligations that attach. For regulators, the coming years will test whether the current framework suffices or whether targeted adjustments — most obviously, a greater role for autonomy and tool access as stand-alone risk factors within the risk-based approach — are required.

Either way, the questions raised by agentic AI are no longer theoretical. The systems are shipping. The obligations attach on the day they are placed on the market or put into service.

· · ·

References and Sources

1. Förster, Die KI-VO in der Praxis, § 1 mn. 12 et seq. ("Autonome Arbeitsweise").

2. On the two-track architecture of the regulation, see Förster, op. cit., § 1 mn. 61 et seq.

3. For a concise treatment of the systemic-risk classification, see Förster, op. cit., § 1 mn. 65 et seq.

4. Förster, op. cit., § 2 mn. 94.

5. Moffatt v. Air Canada, 2024 BCCRT 149.

Kanzlei Theo Funk — AI Act and Agentic AI Advisory

Rechtsanwalt Theo Funk advises international technology companies on the classification of agentic AI systems under the AI Act, on GPAI and systemic-risk analysis, on human-oversight architectures for high-risk systems, and on the structuring of value-chain contracts between model providers, system providers, and deployers. Where an agentic product is in development or preparing for EU market entry, an early regulatory architecture review saves significant downstream compliance cost. We are available for an initial consultation to scope the analysis.

Get in touch → office@kanzlei-theofunk.de

This article is provided for general informational and educational purposes only and does not constitute legal advice. It reflects the legal framework as of the date of its preparation. Companies developing or deploying agentic AI systems should seek tailored legal advice in relation to their specific circumstances. © 2026 Kanzlei Theo Funk, Bamberg. All rights reserved.

Agentic AI Systems under the EU AI Act: A Risk-Based Regime Meets Its First Stress Test