Foundations for AI agent evaluation and governance FIGURE 5 Classiﬁcation dimensions Evaluation criteria Progressive governance practicesRisk assessment life cycle Classification defines an agent’s characteristics and operating context to guide evaluation, risk assessment and governance. To support evaluation and risk assessment, agents can be described across a set of dimensions that capture both their internal characteristics and the external contexts in which they operate. These dimensions provide a structured approach to analyse and compare agents across applications, ensuring clarity about their design choices and real- world effects. In combination, the proposed dimensions define how an agent operates, what actions it is permitted to take and the complexity of the context it is deployed in. The agent’s overall impact can be seen as a profile that emerges from the interaction of these dimensions, reflecting the benefits or risks of its application in practice.14 Function refers to the specific role, purpose or set of tasks the agent is designed to perform. It describes what the agent does in practice, independent of the environment it is deployed in. For example, a coding co-pilot that generates software snippets and a triage assistant that prioritizes patients in an emergency department have distinct functions, even though both operate in digital workflows.Role reflects the breadth of tasks an agent can perform. Specialized agents are narrowly focused and optimized for specific domains, while generalized agents can adapt across domains to address a broader range of tasks or challenges. For instance, a tax-filing agent designed only to prepare returns is specialized, whereas a personal digital assistant that manages scheduling, email drafting and online search operates as a generalist agent. Predictability describes the stability and repeatability of agent behaviour. Deterministic agents produce consistent, identical outputs when given the same inputs, which makes their performance highly predictable and easier to validate. Non-deterministic agents, by contrast, may evolve, learn or generate variable outputs over time.15 This variability can support creativity, adaptation and exploration, but it reduces the reliability of producing identical results under identical conditions. For adopters, predictability determines how much confidence they can place in an agent’s outputs, how reproducible those outputs are, and what level of oversight is required to manage variability in practice. Autonomy captures the degree to which an agent can define and pursue objectives. The spectrum ranges from simple command-response systems to 2.1 Classification AI Agents in Action: Foundations for Evaluation and Governance 13

AI Agents in Action Foundations for Evaluation and Governance 2025