Because safe behaviour and secure behaviour are no longer separable in practice. An agent can be aligned in intent and still be exploited through tool misuse, prompt manipulation, or poor runtime controls. Teams need a shared review path so model risk, operational access, and behavioural evidence are assessed together.
Why This Matters for Security Teams
AI agents collapse the old split between “safe” model behaviour and “secure” system behaviour. A team can validate intent, guardrails, and policy prompts, yet still miss the real risk: the agent acting through tools, APIs, files, and delegated credentials in ways that security controls were never designed to inspect. That is why current guidance increasingly treats agent governance as a shared safety and security problem, not a handoff between functions, as reflected in OWASP Agentic AI Top 10 and the NIST AI Risk Management Framework.
NHIMG research shows how quickly the operational gap becomes visible: in AI Agents: The New Attack Surface report, 80% of organisations said their AI agents had already acted beyond intended scope, including unauthorised system access, sensitive data sharing, and credential exposure. That is not just a compliance issue or just a model issue. It is a live attack surface that crosses behavioural safety, access governance, incident response, and evidence collection. In practice, many security teams encounter the breach after an agent has already chained tools and crossed a trust boundary, rather than through intentional testing.
How It Works in Practice
Effective agent oversight starts with a joint review path. Safety teams typically own acceptable behaviour, prompt and policy design, and harmful output prevention. Security teams own identity, secrets, access boundaries, logging, and response. For agents, those two layers must be evaluated together because an apparently well-aligned agent can still misuse a valid token, call the wrong tool, or amplify a prompt injection into a real operational action. The safer pattern is to treat each agent as a governed workload with explicit scope, runtime checks, and revocation paths.
Practically, this means defining what the agent may try to do, not just what it may say. Runtime authorisation should be based on context, task, and risk, not a static role assigned at build time. That is where policy-as-code, short-lived credentials, and workload identity fit together. Tools such as SPIFFE-like workload identity and short-lived OIDC-style tokens can prove what the agent is at request time, while policy engines evaluate whether the action is allowed in that moment. This is consistent with the direction set by CSA MAESTRO agentic AI threat modeling framework and the MITRE ATLAS adversarial AI threat matrix.
- Safety reviews should cover harmful output, goal drift, and prompt injection resilience.
- Security reviews should cover least privilege, secret lifetime, tool scope, and auditability.
- Joint reviews should validate the full tool chain, including lateral movement paths between systems.
- Incident response should preserve traces of prompts, tool calls, and credential use, not just model outputs.
NHIMG’s The State of Non-Human Identity Security also shows why this matters operationally: lack of credential rotation and over-privileged accounts remain major causes of NHI-related attacks. These controls tend to break down when agents are allowed persistent access to production tools because the same identity is reused across unrelated tasks.
Common Variations and Edge Cases
Tighter agent controls often increase delivery overhead, requiring organisations to balance faster experimentation against stronger runtime governance. That tradeoff is real, especially in R&D, customer support automation, and software development workflows where agents need broad but temporary access to multiple systems. Current guidance suggests there is no universal standard for this yet, so the right answer depends on task criticality, data sensitivity, and blast radius.
One common edge case is a low-risk conversational agent that becomes risky only when it can take actions through connected tools. Another is a multi-agent workflow where each component appears limited, but the composition creates privilege escalation across handoffs. In these environments, static RBAC is usually too coarse because it cannot express the intent of a specific run, and long-lived secrets create unnecessary exposure windows. The better pattern is per-task authorisation, ephemeral credentials, and strict separation between model output and real-world execution authority.
Safety and security teams also need a shared standard for evidence. If the agent accesses data, triggers a workflow, or fails a policy check, both teams should be able to explain why. That is why NHIMG’s agent attack surface research and the NIST AI Risk Management Framework both point toward shared governance rather than siloed review. The hardest failures appear when agents are trusted to improvise across interconnected systems with no common approval path.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A2 | Covers prompt injection and tool abuse that blur safety and security boundaries. |
| CSA MAESTRO | T1 | Maps agent threat modeling to shared safety and security governance. |
| NIST AI RMF | GOVERN | AI RMF governance supports cross-functional accountability for agent behavior. |
Review every agent tool path for abuse cases and enforce runtime guards before execution.
Related resources from NHI Mgmt Group
- How should security teams manage permissions for AI agents?
- How should security teams govern AI agents that use OAuth access?
- How should security teams limit the risk from AI agents that have access to production systems?
- How should security teams govern AI agents that can access enterprise systems?
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on June 25, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org