AI Agents in Action Foundations for Evaluation and Governance 2025
Page 13 of 34 · WEF_AI_Agents_in_Action_Foundations_for_Evaluation_and_Governance_2025.pdf
Foundations for AI agent evaluation and governance FIGURE 5
Classification dimensions
Evaluation criteria
Progressive governance practicesRisk assessment life cycle
Classification defines an agent’s characteristics
and operating context to guide evaluation, risk
assessment and governance.
To support evaluation and risk assessment, agents
can be described across a set of dimensions that
capture both their internal characteristics and the
external contexts in which they operate. These
dimensions provide a structured approach to
analyse and compare agents across applications,
ensuring clarity about their design choices and real-
world effects.
In combination, the proposed dimensions define
how an agent operates, what actions it is permitted
to take and the complexity of the context it is
deployed in. The agent’s overall impact can be seen
as a profile that emerges from the interaction of
these dimensions, reflecting the benefits or risks of
its application in practice.14
Function refers to the specific role, purpose or
set of tasks the agent is designed to perform.
It describes what the agent does in practice,
independent of the environment it is deployed
in. For example, a coding co-pilot that generates
software snippets and a triage assistant that
prioritizes patients in an emergency department
have distinct functions, even though both operate in
digital workflows.Role reflects the breadth of tasks an agent
can perform. Specialized agents are narrowly
focused and optimized for specific domains, while
generalized agents can adapt across domains to
address a broader range of tasks or challenges.
For instance, a tax-filing agent designed only
to prepare returns is specialized, whereas a
personal digital assistant that manages scheduling,
email drafting and online search operates as a
generalist agent.
Predictability describes the stability and
repeatability of agent behaviour. Deterministic
agents produce consistent, identical outputs
when given the same inputs, which makes their
performance highly predictable and easier to
validate. Non-deterministic agents, by contrast,
may evolve, learn or generate variable outputs
over time.15 This variability can support creativity,
adaptation and exploration, but it reduces the
reliability of producing identical results under
identical conditions. For adopters, predictability
determines how much confidence they can place
in an agent’s outputs, how reproducible those
outputs are, and what level of oversight is required
to manage variability in practice.
Autonomy captures the degree to which an agent
can define and pursue objectives. The spectrum
ranges from simple command-response systems to 2.1 Classification
AI Agents in Action: Foundations for Evaluation and Governance
13
Ask AI what this page says about a topic: