What breaks when organisations rely only on native AI safety controls?

Native controls often address configuration and content screening, but they do not fully govern what happens after an agent starts interacting with live tools and data. That leaves a gap between policy and behaviour, especially when prompt injection, Shadow AI, or unclassified data are involved.

Why Native Controls Leave a Governance Gap

Native AI safety features are useful, but they are usually designed to constrain the model at the point of configuration, content filtering, or policy declaration. That does not equal control over what an agent does once it is connected to tools, databases, ticketing systems, or code execution. The real risk is behavioural drift: an agent can still be induced by prompt injection, tricked into exposing secrets, or allowed to act on unclassified data that should never have been reachable in the first place.

This is why the issue is bigger than one vendor feature set. Current guidance in NIST Cybersecurity Framework 2.0 still maps well here: identify assets, protect them, monitor activity, and respond when the control boundary fails. NHIMG research on the DeepSeek breach shows how exposed data and embedded secrets can compound model risk, while the Microsoft Azure OpenAI service breach illustrates how access paths and data handling can become the real failure point. In practice, many security teams discover the gap only after an agent has already accessed something it should never have been able to touch.

Where the Control Model Breaks in Practice

Native controls often assume a relatively stable request path: a user asks a question, the model answers, and the platform applies moderation or content rules. Agentic systems do not behave that neatly. They chain actions, call APIs, retrieve context, and sometimes make follow-on decisions without a human in the loop. That means static RBAC, coarse tenant settings, and pre-defined allowlists can fail to reflect the actual intent of the task at runtime.

Practitioners increasingly separate three layers:

Model safety for harmful content and prompt handling.
Identity and access for what the agent can reach, using workload identity and short-lived credentials rather than long-lived static secrets.
Runtime authorisation for deciding whether a specific action is allowed in context, which is where policy-as-code, OPA, or Cedar-style decisions become useful.

That distinction matters because an autonomous Agent is a software entity with execution authority, not a human user with predictable keystrokes. Best practice is evolving toward intent-based authorisation, JIT credential provisioning, and ephemeral secrets that are issued per task and revoked automatically after completion. Frameworks such as NIST Cybersecurity Framework 2.0 and emerging agent guidance from OWASP and CSA-MAESTRO both point in this direction, even though there is no universal standard for agent runtime enforcement yet. The operational lesson is simple: if an agent can pivot from one tool to another, static safety controls do not see the full chain of behaviour. These controls tend to break down when agents are granted broad tool access in environments with unclassified data, shared service accounts, or weak secret lifecycle management, because the model boundary is not the same as the action boundary.

Common Variations and Edge Cases

Tighter runtime controls often increase operational overhead, so organisations have to balance safety against latency, implementation effort, and developer friction. That tradeoff becomes sharper in multi-agent workflows, where one agent may delegate to another, or where a retrieval agent and a transaction agent share the same backend permissions. In those environments, a single static policy rarely captures the real risk.

The most defensible pattern is to treat the agent’s identity as a workload identity, then issue access by task, not by standing entitlement. That means using JIT credentials, limiting TTL, and evaluating policy at the moment of action with full context: who requested the task, what tool is being called, what data classification is involved, and whether the request matches the declared intent. NHIMG’s Ultimate Guide to NHIs — Standards is useful for translating that into NHI governance, while the DeepSeek breach and Microsoft Azure OpenAI service breach show why exposed secrets and overbroad access become liabilities quickly. The NIST Cybersecurity Framework 2.0 remains relevant, but for agentic systems current guidance suggests layering it with OWASP-AGENTIC, CSA-MAESTRO, and NIST-AIRMF. Where this guidance breaks down is in legacy applications that cannot separate model permissions from service credentials, because those systems force coarse access decisions and prevent true task-scoped authorisation.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Agentic systems need runtime controls beyond model safety and static RBAC.
CSA MAESTRO		Covers agent identity, orchestration, and runtime guardrails for autonomous workflows.
NIST AI RMF		AI RMF helps align model risk, operational context, and accountability.

Map every agent action to policy checks and restrict tool use to task-scoped intent.

What breaks when organisations rely only on native AI safety controls?

Why Native Controls Leave a Governance Gap

Where the Control Model Breaks in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group