AI Agents in Action Foundations for Evaluation and Governance 2025
Page 24 of 34 · WEF_AI_Agents_in_Action_Foundations_for_Evaluation_and_Governance_2025.pdf
2.4 Governance considerations for AI agents:
a progressive approach
Progressive governance approaches scale
oversight and safeguards in proportion to the
autonomy, authority and complexity of the agent.
Evaluation and risk assessment provide critical
insights into an agent’s capabilities, performance,
reliability, security, safety and alignment. Governance,
however, determines whether those insights translate
into effective oversight and responsible adoption.
“Governance” refers to the structured application of
technical safeguards and operational, ethical and
organizational processes intended to ensure agents
remain within acceptable risk boundaries over time.
As agents become more capable and integrated into
core workflows, governance must evolve from basic
precautionary measures to dynamic, multi-layered
systems of control and accountability. Governance
levels are informed by risk assessment outcomes,
ensuring that controls scale with demonstrated
autonomy, authority and contextual complexity.
A progressive set of governance levels can be
distinguished, ranging from baseline safeguards to
enhanced controls and systemic risk management.
These levels correspond to the agent’s classification
profile, which is linked to its function, predictability,
autonomy, authority and operational context.
Oversight, therefore, intensifies as agents move
from narrow, low-risk applications to complex, high-
impact environments.Across these levels, governance mechanisms
advance in both scope and sophistication. The
focus shifts from operational safeguards to
comprehensive risk management, with early
levels emphasizing reactive measures, while more
advanced levels incorporate proactive monitoring,
accountability frameworks and systemic
risk assessments.
This progression is evident across key areas such
as monitoring, accountability, risk management,
transparency, adaptability and scope. Monitoring
evolves from basic logging to real-time, AI-
assisted oversight, incorporating the automated
analysis of logs to detect anomalies and
deviations in system behaviour. In parallel, risk
management advances from static checklists to
dynamic, predictive modelling, while the scope
of governance expands from narrow, task-
specific oversight to consideration of broader
ecosystem impacts.
Operational environments are dynamic, and
effective governance often requires recalibrating
autonomy and authority in real time. The
following example illustrates this through
a personal assistant agent, whose level of
autonomy and authority is dynamically adjusted
to ensure ongoing compliance. Governance
levels are informed
by risk assessment
outcomes, ensuring
that controls scale
with demonstrated
autonomy, authority
and contextual
complexity.Risk assessment should be treated as a continuous,
iterative process rather than a single checkpoint.
Ongoing monitoring, regression testing, periodic
reassessment and incident reviews are essential to
maintaining alignment as agentic systems evolve. The
outputs of this process should include a risk register, a control plan with clear ownership and verification
and validation steps, operating limits and monitoring
requirements, and a deployment status. These
outputs feed directly into progressive governance,
ensuring oversight scales in line with an agent’s
demonstrated risk profile and operating context.
AI Agents in Action: Foundations for Evaluation and Governance
24
Ask AI what this page says about a topic: