Prompt injection remains a governance problem, not a testable bug

By NHI Mgmt Group Editorial TeamPublished 2026-01-28Domain: Agentic AI & NHIsSource: Noma Security

TL;DR: Prompt injection is an emergent property of how large language models process context, not a conventional software flaw, and red teaming can surface failure modes without eliminating the risk, according to Noma Security. The governance task is to reduce blast radius, harden permissions, and accept that certainty is unavailable.

At a glance

What this is: This is an analysis of why prompt injection persists in GenAI systems and why AI red teaming helps manage, but cannot eliminate, the risk.

Why it matters: For IAM and NHI practitioners, it shows why autonomous tools and delegated permissions need containment, not just testing and policy text.

By the numbers:

96% of technology professionals identify AI agents as a growing security threat, and 66% believe this risk is immediate.
92% agree governing AI agents is critical to enterprise security, yet only 44% have implemented any policies to do so.
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%).
When AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes , and as quickly as 9 minutes in some cases.

👉 Read Noma Security's analysis of prompt injection and AI red teaming

Context

Prompt injection matters because the model does not receive a clean separation between trusted instructions and untrusted content. In practice, the application assembles documents, chat history, tool descriptions, and retrieved text into one context window, which means a non-human identity such as an AI agent can be steered by language that was never meant to carry authority. That creates a governance problem for IAM and NHI teams, not just a model-safety problem.

The central mistake is treating prompt injection like a bug that can be patched out of existence. It is closer to a control-boundary problem: the system decides what to place in context, the model evaluates all of it probabilistically, and downstream tools may execute the output. That makes privilege scoping, tool permissions, and blast-radius control the practical security questions, not model accuracy alone.

Key questions

Q: How should security teams reduce prompt injection risk in AI agents?

A: Security teams should reduce prompt injection risk by constraining what enters the context window, limiting tool permissions, and separating untrusted retrieval content from privileged instructions. The practical goal is not perfect detection. It is to ensure that a successful injection cannot trigger wide data access, uncontrolled writes, or irreversible actions through a delegated identity.

Q: Why is prompt injection a governance problem as well as a technical one?

A: Prompt injection is a governance problem because the harm depends on who gave the agent authority, what tools it can use, and how much data it can reach. A model can be influenced by language, but only governance determines whether that influence becomes a real security incident through overbroad permissions or weak oversight.

Q: What is the difference between red teaming an AI system and proving it is safe?

A: Red teaming tests how an AI system fails under adversarial conditions, while proving safety would require certainty that failure cannot occur. For probabilistic systems, certainty is unavailable. Red teaming is still valuable because it reveals avoidable control failures, unsafe defaults, and the conditions that expand blast radius.

Q: When do AI agents turn prompt injection into an NHI risk?

A: AI agents become an NHI risk when they can act with delegated credentials, access business systems, or modify data without tight scope controls. At that point, a language-level injection can become an operational incident. The governing question is not whether the model was tricked, but what the agent could do after being tricked.

Technical breakdown

Why the context window creates a control boundary problem

Large language models receive text, not trusted versus untrusted signals. The surrounding application packages system instructions, user prompts, documents, and retrieval results into a single context window, and the model predicts a response from that mixture. Because there is no hard security boundary inside the text stream, an instruction embedded in a document can compete with an intended system rule. That is why prompt injection is not a classic software exploit. The model is doing what it was trained to do, but the application design allows untrusted language to influence privileged behaviour. In NHI terms, the identity is not the only issue. The instruction path becomes part of the attack surface.

Practical implication: Treat context assembly as a privileged control point and limit which data can enter prompts.

Why AI red teaming finds risk but cannot prove safety

AI red teaming is a structured way to probe how a model and its surrounding application fail under adversarial or messy inputs. It can reveal fragile prompts, overbroad tools, permissive defaults, and data leakage conditions. It cannot prove the absence of prompt injection because the attack surface is open-ended and the system is non-deterministic. Small wording changes, language changes, model updates, and retrieval differences can alter the outcome. That means one successful test is a signal, not a total verdict. The right interpretation is that red teaming maps the boundaries of acceptable failure, then feeds governance decisions about what to restrict, monitor, or isolate.

Practical implication: Use red teaming to measure failure modes and set containment thresholds, not to certify immunity.

How tool access turns prompt injection into an NHI issue

Prompt injection becomes materially worse when the model can act through tools such as search, messaging, document creation, or code execution. At that point the model is not only generating text, it is operating through a delegated identity with real permissions. The security question shifts from “Can the model be tricked?” to “What can the agent do if it is tricked?” That is why least privilege, just-in-time access, approval gates, and scoped credentials matter. Without them, a successful injection can turn a language-level influence problem into data exposure, unauthorized actions, or persistence through shared systems.

Practical implication: Bind every agent action to least privilege and constrain tool access to the minimum required scope.

Threat narrative

Attacker objective: The attacker wants to steer an AI agent into performing unauthorized work while staying inside the system's normal language-based workflow.

Entry occurs when an attacker places malicious instructions inside retrieved content, chat history, or another source that the system adds to the context window.
Escalation happens when the model follows the injected instruction and uses overbroad tool permissions or shared credentials to perform an unauthorized action.
Impact is achieved when the downstream tool or agent executes the output, exposing data, modifying documents, or moving sensitive information outside intended controls.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
Cisco DevHub NHI breach — IntelBroker exploited exposed Cisco credentials, API tokens and keys in DevHub.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Prompt injection is a governance boundary failure, not a model-quality issue. The article correctly separates behaviour from exploitability, but the more important lesson for the field is that language-based influence becomes a security problem when applications place untrusted content beside privileged instructions. That means identity, data handling, and execution policy must be designed together. Practitioners should stop asking whether a model is safe in the abstract and start asking where control boundaries actually exist.

Blast-radius control is the decisive design principle for agentic systems. If a prompt can steer an agent, the damage depends on what the agent can touch. Privilege scoping, workflow separation, and approval points matter more than claims of prompt resistance. In NHI governance terms, the right question is how much authority an agent receives before and after it processes untrusted content. Practitioners should engineer for limited impact, not mythical immunity.

AI red teaming remains useful because it exposes avoidable control failures. The article is right that testing cannot prove certainty, but that does not make it optional. Red teaming is how teams find mis-scoped permissions, unsafe defaults, and brittle operating assumptions before attackers do. The field should treat it as an input to policy, access design, and exception handling. Practitioners should use it to harden the environment around the model, not to certify the model itself.

Prompt injection belongs in the same category as shadow AI risk when agents can act without clear ownership. An unmanaged or loosely governed agent can become a durable path to data exposure even when no traditional compromise exists. That expands the NHI problem from secrets and service accounts into autonomous software behaviour. The implication is straightforward: inventory agent identities, assign ownership, and enforce access review before scale makes the exposure harder to unwind.

From our research:
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
That blind spot becomes more severe as agent deployments scale, which is why OWASP NHI Top 10 style controls should be paired with access review and logging.

What this signals

Prompt injection will keep exposing control design weaknesses until teams treat agents as governed identities. The next step for practitioners is to stop evaluating prompts in isolation and map every agent to an owner, a purpose, and a permission envelope. When the system can search, write, or trigger workflows, the real security boundary is the delegated access path, not the model output.

With 92% of organisations agreeing that governing AI agents is critical but only 44% having implemented policies, the programme gap is already visible in most enterprises, according to AI Agents: The New Attack Surface report. The practical response is to move from experimentation to inventory, approval, and review before agents become operational dependencies.

Ephemeral credential trust debt: as organisations add more agents and more tools, temporary authority tends to accumulate faster than review discipline. That creates a hidden control liability, especially where shared documents, retrieval systems, and downstream automation all intersect. Practitioners should expect this to surface first as access exceptions, then as audit gaps, and finally as incident response complexity.

For practitioners

Separate instruction sources from untrusted content Design prompt assembly so system instructions, retrieved documents, and user inputs are not treated as equivalent authority. Use allowlists for retrieval sources, strip instruction-like content where possible, and log what enters the context window for review.
Reduce agent privilege before testing for injection Limit tool access, file access, and write permissions before running red team scenarios. The main goal is to ensure that a successful injection cannot trigger broad data access or destructive actions through a delegated identity.
Bind approvals to high-risk actions Require human approval for data export, external messaging, privilege escalation, and changes to shared documents. Approval gates work best when paired with short-lived credentials and clear action logging.
Treat red teaming as a governance input Use findings to revise policy, risk acceptance, and exception handling rather than treating the exercise as proof of safety. Track recurring failure modes so engineering and IAM teams can address the underlying control gap.

Key takeaways

Prompt injection is best understood as a control-boundary problem because untrusted language can influence privileged agent behaviour.
Red teaming is valuable for exposing failure modes, but it cannot deliver certainty in probabilistic systems.
The practical defence is to limit agent privilege, constrain context sources, and tie high-risk actions to human approval.

Key terms

Prompt Injection: Prompt injection is the use of crafted language to influence how an AI model responds or what actions an agent takes. It works because the model treats text in context as potentially relevant, so malicious instructions can compete with intended system prompts when governance boundaries are weak.
Context Window: The context window is the text a model receives at one time, including prompts, retrieved documents, and conversation history. Security teams care about it because it becomes the practical boundary between trusted instructions and untrusted content, especially when the application assembles that text automatically.
Red Teaming: Red teaming is structured adversarial testing used to find how an AI system fails under realistic misuse or attack conditions. In AI security, it is a discovery method, not a proof of safety, because probabilistic behaviour and changing models prevent any lasting guarantee.
Agent Blast Radius: Agent blast radius is the amount of damage an AI agent can cause if it is manipulated or misused. It is shaped by permissions, data access, tool reach, and approval design, so limiting blast radius is often more effective than trying to eliminate every injection attempt.

Deepen your knowledge

Prompt injection, agent privilege scoping, and blast-radius control are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for AI agents and delegated access, it is worth exploring.

This post draws on content published by Noma Security: Prompt injection, AI red teaming, and what security leaders should know. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-01-28.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org