AI agent security needs safety inputs to become secure by design

By NHI Mgmt Group Editorial TeamPublished 2025-09-19Domain: Agentic AI & NHIsSource: Zenity

TL;DR: AI safety and AI security need to converge, because agent autonomy changes both the control questions and the failure modes that traditional shift-left security was built to answer, according to Zenity. The separation between model alignment and operational security is no longer workable when agents make runtime decisions.

At a glance

What this is: A meetup reflection arguing that AI safety and AI security now need to converge around AI agent governance and secure-by-design thinking.

Why it matters: It matters because practitioners must govern autonomous behaviour, not just model outputs, and that requires joining security, safety, and lifecycle controls across AI, NHI, and IAM programmes.

👉 Read Zenity's reflections on AI safety and AI security convergence for agent governance

Context

AI agent security is no longer just a question of model quality or prompt filtering. Once an agent can choose actions at runtime, the governance problem shifts to how identity, access, and oversight work when behaviour is not fully predetermined.

This article is fundamentally about the gap between AI safety and AI security. The event reflection argues that the two disciplines have to meet in the middle if organisations want secure by design AI instead of repeating the slow, fragmented evolution seen in traditional software.

Key questions

Q: How should security teams govern AI agents that can choose actions at runtime?

A: Treat the agent as an identity-bearing actor with bounded authority, not as a passive application component. Define approved tools, data sources, escalation rules, and monitoring signals before deployment. Governance should cover what the agent can do, when it can do it, and who can override or revoke that authority.

Q: Why do AI agents force safety and security teams to work together?

A: Because safe behaviour and secure behaviour are no longer separable in practice. An agent can be aligned in intent and still be exploited through tool misuse, prompt manipulation, or poor runtime controls. Teams need a shared review path so model risk, operational access, and behavioural evidence are assessed together.

Q: What breaks when shift-left security is applied to autonomous AI systems?

A: Shift-left alone cannot govern behaviour that emerges after deployment. AI agents can change decisions based on context, tool feedback, and runtime inputs, so pre-production checks do not capture every failure mode. Organisations need live monitoring and decision logging, not just earlier review gates.

Q: How do organisations know if AI agent governance is actually working?

A: Look for evidence that the agent stays inside approved actions, that escalation events are visible, and that reviewers can reconstruct why a decision was made. If the programme cannot explain tool use, access changes, or unusual behaviour after the fact, governance is not functioning well enough.

Technical breakdown

Why AI agent security is different from traditional application security

Traditional application security assumes the system follows a fixed execution path that can be reviewed against known controls. AI agents break that assumption because they can decide what to do next, which data to inspect, and which tools to invoke at runtime. That makes the control problem less about static code paths and more about runtime authority, behavioural boundaries, and the evidence generated by the agent’s decisions.

Practical implication: define agent boundaries in terms of allowed actions, data, and escalation conditions, not just application roles.

How AI safety and AI security overlap in agent governance

AI safety focuses on whether a system behaves in aligned, trustworthy, and non-harmful ways. AI security focuses on whether that same system can be attacked, abused, or driven outside intended use. For agents, those concerns overlap because malicious prompting, tool abuse, and scope drift can all produce behaviour that is both unsafe and insecure. A practical governance model has to evaluate intent, output, and operational side effects together.

Practical implication: combine safety evaluation, security testing, and approval boundaries into one governance path for agents.

What shift-left means when the system can act autonomously

Shift-left works poorly when security is bolted onto development after decisions are already fixed. For AI agents, the earlier question is not only whether a model is safe to deploy, but whether the operational controls around identity, tools, and monitoring can still hold once the agent is live. The stronger lesson is that the agent’s runtime behaviour creates new review points that do not map neatly onto classic SDLC gates.

Practical implication: move review effort toward runtime monitoring, tool governance, and pre-production testing of agent behaviour.

NHI Mgmt Group analysis

AI agent security cannot be treated as a pure application-security problem. The article shows why agent behaviour forces security teams to evaluate runtime decision-making, not just model deployment. That matters because tool use, action sequencing, and post-decision evidence all become part of the trust boundary. Practitioners should treat agents as identity-bearing actors whose authority must be governed, not simply monitored.

The safety and security divide is the real programme risk. Zenity’s reflection is strongest where it argues that safety without security leaves systems exposed, while security without safety ignores unintended behaviour. That split maps directly to organisational silos, where model teams, security teams, and governance teams each see only part of the risk. The practical conclusion is that agent governance fails when ownership is fragmented across disciplines.

Secure by design for AI will not emerge from shift-left habits alone. Traditional shift-left strategies struggled because security was separated from development; agent governance risks repeating that failure if runtime controls are not designed from the start. AI systems introduce behaviour that can change after deployment, which means pre-production review is necessary but insufficient. Practitioners should expect the control model to extend into live operations.

Agent autonomy pushes IAM thinking beyond static permissions. When an AI agent can decide what to do next, access review logic built for predictable human or workload behaviour becomes less reliable. The governance question is no longer only who has access, but how that access is selected, combined, and exercised during execution. Identity programmes should treat autonomous access as a distinct governance problem, not a variant of service-account administration.

From our research:
85% of organisations lack full visibility into third-party vendors connected via OAuth apps, according to The State of Non-Human Identity Security.
Only 1.5 out of 10 organisations are highly confident in their ability to secure NHIs, which shows how quickly identity governance weakens when visibility and accountability fragment.
For a broader control baseline, see Ultimate Guide to NHIs , Why NHI Security Matters Now for the governance pressure behind machine identity growth.

What this signals

AI agent governance will increasingly be judged by whether teams can explain runtime decisions, not just prevent obvious misuse. The organisations that already struggle with third-party OAuth visibility have a useful warning sign: identity oversight degrades quickly when execution paths are distributed across actors and services. That is why agent programmes need shared evidence trails, not just policy statements.

Shift-left for AI will only work if runtime controls are designed as part of the operating model. Security teams should expect agent evaluations, monitoring, and access revocation to become continuous processes rather than one-time launch gates. The practical challenge is to make those controls visible to both AI safety and security stakeholders.

Agent identity governance will converge with existing NHI and IAM discipline. As agents become more operationally active, programmes will need the same lifecycle thinking used for machine identities, but adapted for behaviour that can change mid-session. That makes governance design, not model novelty, the decisive programme issue.

For practitioners

Define runtime guardrails for agent behaviour Map which actions, tools, and data sources an agent can use, then make escalation conditions explicit before deployment. Treat the approved action set as a governance boundary, not a loose operating preference.
Unify safety and security review paths Bring model evaluation, security testing, and approval workflows into the same governance process so one team is not certifying a system that another team cannot operationally secure.
Test agent behaviour before production rollout Evaluate how an agent responds to ambiguous prompts, conflicting goals, and unexpected tool outputs, then document where the behaviour crosses from acceptable assistance into unsafe execution.
Extend monitoring into live agent operations Track what the agent did, which resources it touched, and how its decisions evolved during execution so post-deployment controls can detect scope drift and unauthorised action paths.

Key takeaways

AI agents change the security problem because runtime decision-making creates a governance boundary that static application controls cannot fully cover.
The most important organisational gap is the split between AI safety and AI security, which leaves runtime behaviour, tool use, and accountability only partially governed.
Practitioners should build shared review, monitoring, and override processes now, because secure by design AI depends on controls that work after deployment.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Agent runtime behaviour and tool use are central to the article.
NIST AI RMF		The article centres on governance and risk management for AI systems.
NIST CSF 2.0	PR.AA-01	Identity and access accountability matters for autonomous agent operations.

Define agent tool boundaries, approvals, and runtime logging before production use.

Key terms

AI Agent: A software entity that can choose actions at runtime and use tools or data sources to pursue a goal. In governance terms, it needs explicit boundaries for authority, monitoring, and override because its behaviour can change based on context and feedback.
Secure by Design AI: An approach to AI development that builds safety, security, and governance into the system before deployment. It assumes runtime behaviour must be controlled, not just evaluated after the fact, and that both model and operational risk need shared oversight.
Runtime Guardrails: Controls that limit what an AI system can do while it is operating, such as tool restrictions, approval rules, and logging. They matter because autonomous systems can make decisions after launch, when pre-production testing no longer captures every risk.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or governance maturity, it is worth exploring.

This post draws on content published by Zenity: Bridging AI Safety and AI Security: Reflections from the NYC AI Safety Meetup. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-09-19.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org