Prompt injection exposes the trust gap in GenAI instructions

By NHI Mgmt Group Editorial TeamPublished 2026-06-09Domain: Agentic AI & NHIsSource: Lasso Security

TL;DR: Prompt injection lets adversarial input override a generative AI model’s intended instructions, exposing data, triggering unauthorized actions, and undermining trust in GenAI systems, according to Lasso Security. The control problem is not just malicious text, but the assumption that untrusted input can be safely mixed with instructions.

At a glance

What this is: Prompt injection is an attack that makes GenAI treat untrusted input as instructions, with direct and indirect variants that can expose data or trigger unauthorized actions.

Why it matters: It matters because security teams governing AI agents, machine identities, and human-facing GenAI interfaces need controls that separate data from instructions before the model can act.

👉 Read Lasso Security's guide to prompt injection and GenAI prevention

Context

Prompt injection is a control failure in GenAI systems: untrusted content is processed as if it were instruction text, and the model follows the wrong source of authority. In identity terms, that breaks the basic boundary between user input, system prompt, and downstream action in AI-assisted workflows.

For IAM, NHI, and AI governance teams, the risk is not just bad output. It is the possibility that a model with access to tools, data, or workflows will expose secrets, alter records, or trigger actions because the instruction boundary was never enforced as a security control.

Key questions

Q: How should security teams reduce prompt injection risk in GenAI systems?

A: They should separate instructions from data, restrict tool permissions, and log model-triggered actions. The goal is to prevent untrusted text from altering system behaviour and to limit the damage if an injection succeeds. Prompt injection is less about perfect prevention than about shrinking the model’s authority over sensitive workflows.

Q: Why does prompt injection become more dangerous when a model can use tools?

A: Because the output stops being just text. A compromised instruction path can become a real action, such as deleting data, revealing secrets, or changing records. The larger the attached permissions, the larger the blast radius, so tool access must be treated like delegated privilege, not a convenience feature.

Q: What do teams get wrong about indirect prompt injection?

A: They assume external content is passive. In reality, email, documents, search results, and web pages can carry attacker instructions into the model’s context window. If retrieved content is not treated as hostile input, the model may follow the attacker’s direction instead of the organisation’s policy.

Q: What should organisations do first if GenAI is connected to sensitive systems?

A: Start by limiting what the model can reach, then add monitoring and response controls around every action it can trigger. A model with broad permissions becomes a control-plane risk, so security teams should reduce authority before expanding deployment scope.

Technical breakdown

Why prompt injection matters when models can act

Prompt injection becomes more dangerous when the model is connected to tools, databases, or workflows, because a wrong instruction can turn into a real-world action. In that setup, the model is no longer only generating text. It is selecting a response path that can alter records, disclose secrets, or trigger side effects. That is why prompt injection is often discussed alongside agentic risk, even when the original system is not fully autonomous. The exposure grows with every permission the model inherits from the surrounding application.

Practical implication: limit tool permissions to the smallest viable scope and log every model-initiated action for review.

Threat narrative

Attacker objective: The attacker wants the model to obey attacker-authored instructions instead of the system’s intended controls, creating disclosure or action outside authorised bounds.

entry: The attacker introduces malicious instructions through a direct prompt, an email, a document, or other content the model will ingest as context.
escalation: The model treats the injected text as higher-priority instruction material and may reveal hidden prompts, sensitive data, or operational details.
impact: The poisoned response can trigger unauthorized actions, data exposure, or harmful content generation inside connected systems.

Cisco DevHub NHI breach — IntelBroker exploited exposed Cisco credentials, API tokens and keys in DevHub.
ASP.NET machine keys RCE attack — 3,000+ exposed ASP.NET machine keys enabled remote code execution.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Prompt injection is an instruction-boundary failure, not just a content-filter problem. The model is being asked to distinguish authority inside the same input stream, which is a governance problem as much as a technical one. Once untrusted text can override system intent, the application has lost control of who or what is directing the action. Practitioners should treat prompt separation as a first-class identity control, not a prompt-tuning exercise.

Instruction boundary collapse is the right concept for GenAI systems that mix user text and operational logic. This article describes the classic failure mode: untrusted content is processed as if it were authorised command text. That breaks the premise that application logic can safely coexist with user-provided language in the same execution path. The practical conclusion is that any model connected to data, tools, or workflows must be designed so the instruction layer remains non-negotiable.

GenAI risk increases sharply when model output is coupled to privileged actions. A text-only mistake becomes a security event when the model can delete messages, alter data, or reveal secrets through connected permissions. That makes prompt injection an IAM-adjacent control issue, because the real failure is overbroad delegation into the model environment. Security teams should judge the risk by the permissions attached to the workflow, not by the chatbot interface alone.

OWASP NHI Top 10 remains the most useful lens when prompt injection crosses into tool use. Once a model can call tools or trigger actions, the question is no longer only what it says but what it can cause to happen. That pushes the problem from content safety into non-human identity governance, where delegated authority, scope, and action logging become the decisive controls. Practitioners should align GenAI controls with the permissions the system can actually exercise.

Prompt injection exposes a trust model that still assumes AI will behave like a passive application. That assumption holds poorly once the model interprets untrusted text, retrieves external content, and acts in the same session. The right response is to redesign identity and execution boundaries around the possibility that the model will be influenced at runtime. Organisations that treat GenAI as just another interface will keep underestimating the blast radius.

From our research:
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
Another finding from our research shows that only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
52 NHI Breaches Analysis shows how delegated access failures compound when credentials, systems, and decision paths are not governed as part of one identity lifecycle.

What this signals

Instruction-boundary failures will become a standard governance test for GenAI deployments. Security teams should expect prompt injection to show up wherever models ingest external content or can act on behalf of a user. With 80% of organisations already reporting agent actions beyond intended scope, according to AI Agents: The New Attack Surface report, the issue is no longer theoretical.

Prompt injection should be read as a precursor to broader non-human identity control failures. Once a model can retrieve, decide, and act, the real programme question is whether the surrounding permissions were designed for influence-resistant execution. That is why the boundary between GenAI safety and IAM governance is narrowing quickly.

Teams that are already mapping AI agent permissions should use this topic to sharpen their control model around retrieval trust, tool scope, and action logging. The next wave of incidents is likely to involve more than bad text, and the organisations that can prove who or what initiated an action will be the ones that can investigate it.

For practitioners

Separate instructions from data Use prompt partitioning so system instructions, policy text, and user content remain structurally distinct before the model processes them. This reduces the chance that attacker text can override control logic.
Sandbox every model-triggered action Run tool calls, code execution, and database operations in isolated environments with explicit allowlists. Limit what a successful injection can do even if the model is manipulated.
Log and review model decisions Capture prompts, retrieval inputs, tool invocations, and outputs so suspicious sequences can be reconstructed during incident analysis. Continuous monitoring only works if the action trail is complete.
Constrain external retrieval inputs Treat email, web content, documents, and other retrieved sources as untrusted until validated. Apply allowlists, sanitisation, and source reputation checks before the model can use the material.

Key takeaways

Prompt injection works because GenAI systems often blur the line between data and instructions.
The risk scales sharply when the model can use tools, access data, or trigger operational actions.
Security teams should prioritise instruction separation, tool scoping, and complete action logging before expanding GenAI use.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Prompt injection is a core agentic AI input-manipulation risk.
NIST AI RMF		GenAI governance must address misuse, accountability, and operational risk.
NIST CSF 2.0	PR.AC-4	Model permissions and delegated actions map to access control governance.

Separate instructions from data and constrain model tool use to reduce prompt injection impact.

Key terms

Prompt Injection: Prompt injection is a malicious input technique that causes a generative AI model to follow attacker instructions instead of the system’s intended guidance. In practice, it works because the model cannot reliably tell trusted control text from untrusted user or retrieved content without application-level separation.
Instruction Boundary: An instruction boundary is the control line between policy, system prompts, user content, and external data. For GenAI security, that boundary must be enforced by the application, because the model itself may treat all text as equally relevant unless the design prevents it.
Indirect Prompt Injection: Indirect prompt injection is an attack where malicious instructions arrive through a data source the model reads later, such as email, documents, search results, or web pages. The attacker does not need direct chat access, only a path into the model’s context pipeline.
Model-Triggered Action: A model-triggered action is any operational step a GenAI system can initiate after interpreting a prompt, such as retrieving data, calling a tool, changing a record, or sending a message. Once actions are possible, prompt injection becomes an access-control problem, not just a content problem.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or NHI governance in your organisation, it is worth exploring.

This post draws on content published by Lasso Security: Prompt Injection: What It Is and How to Prevent It. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-06-09.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org