AI Agents in Action Foundations for Evaluation and Governance 2025

Page 33 of 34 · WEF_AI_Agents_in_Action_Foundations_for_Evaluation_and_Governance_2025.pdf

1. Capgemini Research Institute. (2024). Harnessing the value of generative AI. https://www.capgemini.com/wp-content/ uploads/2024/05/Final-Web-Version-Report-Gen-AI-in-Organization-Refresh.pdf. 2. Organisation for Economic Co-operation and Development (OECD). (2024). Recommendation of the Council on Artificial Intelligence. https://legalinstruments.oecd.org/en/instruments/oecd-legal-0449 . 3. National Institute of Standards and Technology (NIST). (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=936225. 4. International Organization for Standardization (ISO). (2023). ISO/IEC 23894:2023: Information technology — Artificial intelligence — Guidance on risk management. https://www.iso.org/standard/77304.html. 5. Claude Docs. (n.d.). Features overview. https://docs.claude.com/en/docs/build-with-claude/overview . 6. Anthropic. (2024). Introducing the Model Context Protocol. https://www.anthropic.com/news/model-context-protocol. 7. Surapaneni, R., M. Jha, M. Vakoc and T. Segal. (2025). Announcing the Agent2Agent Protocol (A2A). Google for Developers. https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/. 8. Mitchell, M., S. Wu, A. Zaldivar, P . Barnes, et al. (2019). Model Cards for Model Reporting. FAT* 19: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 220-229. https://dl.acm.org/ doi/10.1145/3287560.3287596. 9. Parikh, S. and R. Surapaneni. (2025). Powering AI commerce with the new Agent Payments Protocol (AP2). Google Cloud. https://cloud.google.com/blog/products/ai-machine-learning/announcing-agents-to-payments-ap2-protocol. 10. Cloudflare. (n.d.). Zero Trust security | What is a Zero Trust network? https://www.cloudflare.com/en-gb/learning/security/ glossary/what-is-zero-trust/. 11. Hasan, M. M., L. Hao, E. Fallahzadeh, B. Adams, et al. (2025). Model Context Protocol (MCP) at First Glance: Studying the Security and Maintainability of MCP Servers. https://arxiv.org/abs/2506.13538v1. 12. Lynch, B. and R. Harang. (2025). From Prompts to Pwns: Exploiting and Securing AI Agents. https://i.blackhat.com/BH- USA-25/Presentations/US-25-Lynch-From-Prompts-to-Pwns.pdf. 13. Adapted from: International Organization for Standardization (ISO). (2023). ISO/IEC 42001:2023: Information Technology — Artificial intelligence — Management system. https://www.iso.org/standard/81230.html; National Institute of Standards and Technology (NIST). (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). https://tsapps.nist.gov/ publication/get_pdf.cfm?pub_id=936225. 14. Ibid. 15. Capgemini. (n.d.). Business, meet agentic AI. https://www.capgemini.com/wp-content/uploads/2025/05/Confidence-in- autonomous-and-agentic-systems_19May.pdf. 16. SAE International. (2021). J3016_202104 - Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicles. https://www.sae.org/standards/j3016_202104-taxonomy-definitions-terms-related-driving- automation-systems-road-motor-vehicles. 17. Russell, S. J. and P . Norvig. (2021). Artificial Intelligence: A Modern Approach. Pearson. 18. Hendrycks, D., C. Burns, S. Basart, A. Zou, et al. (2021). Measuring Massive Multitask Language Understanding. https://arxiv.org/abs/2009.03300. 19. Srivastava, A., A. Rastogi, A. Rao, A. A. Shoeb, et al. (2022). Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research (TMLR). https://arxiv.org/abs/2206.04615. 20. Liang, P ., R. Bommasani, T. Lee, D. Tsipras, et al. (2022). Holistic Evaluation of Language Models. Transactions on Machine Learning Research (TMLR). https://arxiv.org/abs/2211.09110. 21. Liu, X., H. Yu, H. Zhang, Y. Xu, et al. (2024). AgentBench: Evaluating LLMs as Agents. International Conference on Learning Representations (ICLR). https://arxiv.org/abs/2308.03688. 22. Jimenez, C. E., J. Yang, A. Wettig, S. Yao, et al. (2023). SWE-bench: Can Language Models Resolve Real-World GitHub Issues? https://arxiv.org/abs/2310.06770. 23. Rein, D., J. Becker, A. Deng, S. Nix, et al. (2025). HCAST: Human-Calibrated Autonomy Software Tasks. https://arxiv.org/ abs/2503.17354. 24. “Risk” refers to the composite measure of an event’s probability (or likelihood) of occurring and the magnitude or degree of the consequences of the corresponding event; National Institute of Standards and Technology (NIST). (2024). Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile. https://nvlpubs.nist.gov/nistpubs/ai/NIST. AI.600-1.pdf. 25. Adapted from: International Organization for Standardization (ISO). (2023). ISO/IEC 42001:2023: Information Technology — Artificial intelligence — Management system. https://www.iso.org/standard/81230.html; National Institute of Standards and Technology (NIST). (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). https://tsapps.nist.gov/ publication/get_pdf.cfm?pub_id=936225.Endnotes AI Agents in Action: Foundations for Evaluation and Governance 33
Ask AI what this page says about a topic: