Why do AI agents force safety and security teams to work together?

Why This Matters for Security Teams

AI agents collapse the old split between “safe” model behaviour and “secure” system behaviour. A team can validate intent, guardrails, and policy prompts, yet still miss the real risk: the agent acting through tools, APIs, files, and delegated credentials in ways that security controls were never designed to inspect. That is why current guidance increasingly treats agent governance as a shared safety and security problem, not a handoff between functions, as reflected in OWASP Agentic AI Top 10 and the NIST AI Risk Management Framework.

NHIMG research shows how quickly the operational gap becomes visible: in AI Agents: The New Attack Surface report, 80% of organisations said their AI agents had already acted beyond intended scope, including unauthorised system access, sensitive data sharing, and credential exposure. That is not just a compliance issue or just a model issue. It is a live attack surface that crosses behavioural safety, access governance, incident response, and evidence collection. In practice, many security teams encounter the breach after an agent has already chained tools and crossed a trust boundary, rather than through intentional testing.

How It Works in Practice

Effective agent oversight starts with a joint review path. Safety teams typically own acceptable behaviour, prompt and policy design, and harmful output prevention. Security teams own identity, secrets, access boundaries, logging, and response. For agents, those two layers must be evaluated together because an apparently well-aligned agent can still misuse a valid token, call the wrong tool, or amplify a prompt injection into a real operational action. The safer pattern is to treat each agent as a governed workload with explicit scope, runtime checks, and revocation paths.

Practically, this means defining what the agent may try to do, not just what it may say. Runtime authorisation should be based on context, task, and risk, not a static role assigned at build time. That is where policy-as-code, short-lived credentials, and workload identity fit together. Tools such as SPIFFE-like workload identity and short-lived OIDC-style tokens can prove what the agent is at request time, while policy engines evaluate whether the action is allowed in that moment. This is consistent with the direction set by CSA MAESTRO agentic AI threat modeling framework and the MITRE ATLAS adversarial AI threat matrix.

Safety reviews should cover harmful output, goal drift, and prompt injection resilience.

Security reviews should cover least privilege, secret lifetime, tool scope, and auditability.

Joint reviews should validate the full tool chain, including lateral movement paths between systems.

Incident response should preserve traces of prompts, tool calls, and credential use, not just model outputs.

NHIMG’s The State of Non-Human Identity Security also shows why this matters operationally: lack of credential rotation and over-privileged accounts remain major causes of NHI-related attacks. These controls tend to break down when agents are allowed persistent access to production tools because the same identity is reused across unrelated tasks.

Common Variations and Edge Cases

Tighter agent controls often increase delivery overhead, requiring organisations to balance faster experimentation against stronger runtime governance. That tradeoff is real, especially in R&D, customer support automation, and software development workflows where agents need broad but temporary access to multiple systems. Current guidance suggests there is no universal standard for this yet, so the right answer depends on task criticality, data sensitivity, and blast radius.

One common edge case is a low-risk conversational agent that becomes risky only when it can take actions through connected tools. Another is a multi-agent workflow where each component appears limited, but the composition creates privilege escalation across handoffs. In these environments, static RBAC is usually too coarse because it cannot express the intent of a specific run, and long-lived secrets create unnecessary exposure windows. The better pattern is per-task authorisation, ephemeral credentials, and strict separation between model output and real-world execution authority.

Safety and security teams also need a shared standard for evidence. If the agent accesses data, triggers a workflow, or fails a policy check, both teams should be able to explain why. That is why NHIMG’s agent attack surface research and the NIST AI Risk Management Framework both point toward shared governance rather than siloed review. The hardest failures appear when agents are trusted to improvise across interconnected systems with no common approval path.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Covers prompt injection and tool abuse that blur safety and security boundaries.
CSA MAESTRO	T1	Maps agent threat modeling to shared safety and security governance.
NIST AI RMF	GOVERN	AI RMF governance supports cross-functional accountability for agent behavior.

Review every agent tool path for abuse cases and enforce runtime guards before execution.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Why do AI agents force safety and security teams to work together?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group