What breaks when AI runtime attacks are treated as prompt-safety issues only?

Why This Matters for Security Teams

Treating AI runtime attacks as prompt-safety problems only creates a dangerous blind spot: the model may look “safe” in testing while still issuing tool calls, reading sensitive data, or triggering automation in production. Prompt filters can reduce obvious abuse, but they do not govern execution authority. That is the core issue in agentic systems, where the real risk begins after the prompt is accepted. Guidance from OWASP NHI Top 10 and emerging work like the Anthropic AI-orchestrated cyber espionage report both point to the same operational reality: adversaries target the runtime, not just the prompt. In NHIMG research, LLMjacking: How Attackers Hijack AI Using Compromised NHIs shows how exposed credentials can be abused rapidly once discovered. In practice, many security teams discover this only after an agent has already accessed data or executed an action that was never covered by prompt-safety testing.

How It Works in Practice

Runtime AI attacks succeed when the model is allowed to act on behalf of the organisation without a second layer of authorisation. A prompt may be benign, but once the model is connected to tools, APIs, databases, or ticketing systems, the decision surface expands from text safety to operational control. This is why static, role-based IAM often fails for autonomous workloads: the agent’s behaviour is goal-driven, not pre-scripted.

Current guidance suggests shifting from prompt-centric controls to context-aware runtime governance. That means validating every high-impact action at the point of execution, not just screening user input. Practical controls include:

Just-in-time credential issuance for each task, with short TTLs and automatic revocation on completion.

Workload identity for the agent itself, so tools can verify what is acting, not only what was said.

Policy-as-code checks at runtime, using context such as requested action, data sensitivity, destination system, and human approval state.

Segmentation of tool permissions so the agent cannot chain low-risk actions into privileged workflows without explicit approval.

This aligns with the direction described in Ultimate Guide to NHIs — Key Challenges and Risks and with implementation patterns discussed by the CISA cyber threat advisories. The practical control objective is simple: if the agent can act, then the action must be authorized independently of the prompt. These controls tend to break down when legacy automation platforms expose broad API credentials because the agent inherits implicit trust across too many downstream systems.

Common Variations and Edge Cases

Tighter runtime control often increases latency and operational overhead, so organisations must balance safety against workflow friction. That tradeoff is especially visible in multi-agent systems, where one agent’s output becomes another agent’s input and the blast radius grows quickly. There is no universal standard for this yet, but current guidance suggests treating each agent hop as a new authorization event rather than a continuation of the original request.

Edge cases matter. Read-only agents still need strong identity controls because read access can become reconnaissance for later escalation. Human-in-the-loop approval is helpful, but it is not a substitute for policy enforcement if the approval path is bypassable. Long-lived secrets are particularly dangerous because they outlast the specific task and can be reused after the original context has changed. That is why NHIMG’s The State of Secrets in AppSec is relevant here: exposed or fragmented secrets management undermines runtime governance even when prompt safety is strong.

For organisations mapping this to broader AI risk programs, the most useful framing is that prompt safety is a content filter, while runtime security is an authorization model. When those layers are merged, teams miss lateral movement, chained tool abuse, and credential misuse until after impact has already occurred.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A3	Runtime tool abuse is an agentic security failure, not just prompt injection.
CSA MAESTRO	MAESTRO-05	Covers autonomous agent governance and runtime policy enforcement.
NIST AI RMF	GOVERN	AI risk governance must cover downstream operational impact, not just model output.

Apply runtime controls that govern agent decisions, tools, and escalation paths.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What breaks when AI runtime attacks are treated as prompt-safety issues only?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group