2.4 Governance considerations for AI agents: a progressive approach Progressive governance approaches scale oversight and safeguards in proportion to the autonomy, authority and complexity of the agent. Evaluation and risk assessment provide critical insights into an agent’s capabilities, performance, reliability, security, safety and alignment. Governance, however, determines whether those insights translate into effective oversight and responsible adoption. “Governance” refers to the structured application of technical safeguards and operational, ethical and organizational processes intended to ensure agents remain within acceptable risk boundaries over time. As agents become more capable and integrated into core workflows, governance must evolve from basic precautionary measures to dynamic, multi-layered systems of control and accountability. Governance levels are informed by risk assessment outcomes, ensuring that controls scale with demonstrated autonomy, authority and contextual complexity. A progressive set of governance levels can be distinguished, ranging from baseline safeguards to enhanced controls and systemic risk management. These levels correspond to the agent’s classification profile, which is linked to its function, predictability, autonomy, authority and operational context. Oversight, therefore, intensifies as agents move from narrow, low-risk applications to complex, high- impact environments.Across these levels, governance mechanisms advance in both scope and sophistication. The focus shifts from operational safeguards to comprehensive risk management, with early levels emphasizing reactive measures, while more advanced levels incorporate proactive monitoring, accountability frameworks and systemic risk assessments. This progression is evident across key areas such as monitoring, accountability, risk management, transparency, adaptability and scope. Monitoring evolves from basic logging to real-time, AI- assisted oversight, incorporating the automated analysis of logs to detect anomalies and deviations in system behaviour. In parallel, risk management advances from static checklists to dynamic, predictive modelling, while the scope of governance expands from narrow, task- specific oversight to consideration of broader ecosystem impacts. Operational environments are dynamic, and effective governance often requires recalibrating autonomy and authority in real time. The following example illustrates this through a personal assistant agent, whose level of autonomy and authority is dynamically adjusted to ensure ongoing compliance. Governance levels are informed by risk assessment outcomes, ensuring that controls scale with demonstrated autonomy, authority and contextual complexity.Risk assessment should be treated as a continuous, iterative process rather than a single checkpoint. Ongoing monitoring, regression testing, periodic reassessment and incident reviews are essential to maintaining alignment as agentic systems evolve. The outputs of this process should include a risk register, a control plan with clear ownership and verification and validation steps, operating limits and monitoring requirements, and a deployment status. These outputs feed directly into progressive governance, ensuring oversight scales in line with an agent’s demonstrated risk profile and operating context. AI Agents in Action: Foundations for Evaluation and Governance 24

AI Agents in Action Foundations for Evaluation and Governance 2025