Prompt injection examples expose where AI security controls fail

By NHI Mgmt Group Editorial TeamPublished 2026-03-30Domain: Breaches & IncidentsSource: Lasso Security

TL;DR: Prompt injection exploits how language models mix instructions, data, memory, and tool execution, creating real-world bypasses in chatbots, retrieval pipelines, and autonomous workflows, according to Lasso Security. The core risk is architectural: when systems collapse trust boundaries, traditional perimeter controls cannot reliably tell content from control.

At a glance

What this is: This is an analysis of prompt injection across modern AI architectures, with examples showing how hidden instructions can redirect model behaviour in chat, RAG, and tool-using systems.

Why it matters: It matters because IAM and security teams now have to govern how AI systems interpret authority, not just who logs in or which token is valid, across human, NHI, and autonomous contexts.

By the numbers:

Real-world testing by Lasso Security revealed a significant 42% bypass rate.

👉 Read Lasso Security's analysis of prompt injection examples in AI systems

Context

Prompt injection is a security problem where untrusted text gets treated like instruction instead of data. In modern AI systems, that matters because the model is often asked to infer intent across user prompts, retrieved content, memory, and tool calls, which weakens conventional access boundaries and makes prompt injection a governance issue as much as a model-safety issue.

For IAM practitioners, the key change is that authority now flows through the AI execution path, not just through human approval points or static policy checks. That creates an identity and control problem for AI agents and other non-human identities, especially when retrieval, memory, and automation are connected to sensitive systems.

Key questions

Q: How should security teams stop prompt injection from affecting AI workflows?

A: Start by isolating instructions from untrusted data so retrieved content cannot rewrite system intent. Then require verification before any tool execution, state change, or sensitive data access. The goal is to make the model advisory by default and to preserve a complete audit trail across prompts, context, and outputs.

Q: Why do prompt injection attacks bypass many AI guardrails?

A: Because many guardrails inspect input or output in isolation, while the attack succeeds in the middle of the execution path. If malicious instructions are embedded in trusted documents, emails, or retrieved records, the model may process them as context. That makes provenance, isolation, and runtime policy enforcement more important than prompt hardening alone.

Q: What breaks when AI agents can call tools after reading untrusted content?

A: The system stops being a text processor and becomes an execution surface. If an agent can ingest poisoned context and then invoke tools with delegated privileges, the attacker can redirect control flow without direct user approval. That is why tool execution needs a separate authorization check from the model’s reasoning step.

Q: How do teams know whether prompt injection controls are actually working?

A: Look for end-to-end visibility across prompts, retrieved content, memory, tool calls, and outputs, plus evidence that blocked actions stay blocked under realistic test cases. If the system can only be evaluated with static prompts, the controls are probably too narrow. Behaviour drift under multi-turn workflows is the signal to watch.

Technical breakdown

How prompt injection exploits mixed instruction and data channels

Prompt injection works because many AI systems place system instructions, user input, retrieved content, and memory into one reasoning path. The model is then forced to infer which text is authoritative instead of enforcing a hard boundary between data and instruction. That is why hidden instructions in documents, emails, or web pages can shape outputs without looking malicious at ingestion time. The failure is not only in the prompt. It is in the architecture that lets untrusted content inherit execution influence.

Practical implication: separate instruction channels from data channels so untrusted content cannot change model behaviour by proximity alone.

RAG prompt injection and trust boundary collapse

Retrieval-augmented generation introduces a persistent trust problem because external content is pulled into the model context as if it were relevant reference material. If provenance is weak, the model cannot distinguish a document meant to be summarised from one that is meant to issue instructions. This is why indirect prompt injection can survive multiple steps and sessions. The attack becomes more durable when knowledge bases, tickets, emails, or vector stores accumulate unreviewed content over time.

Practical implication: preserve provenance and apply context isolation across ingestion, retrieval, and generation layers.

Tool-calling agents turn prompt injection into action control

The risk increases sharply when the model can call tools or trigger workflows. In that setting, prompt injection is no longer just content manipulation. It becomes a control problem, because malicious instructions can redirect execution, change workflow logic, or cause the agent to act with delegated privileges. If the system treats model output as authoritative, a poisoned context can become a privileged action. That is the point where AI security overlaps with NHI governance and authorization design.

Practical implication: require explicit verification before tool execution, state changes, or sensitive data access.

Threat narrative

Attacker objective: The attacker wants to steer model behaviour and use trusted AI workflows to expose information, alter outputs, or trigger unauthorized actions.

Entry happens when an attacker places malicious instructions into content the AI system already trusts, such as emails, documents, web pages, or retrieved records.
Escalation occurs when the model blends that content into its reasoning path and treats the hidden instructions as operational context rather than untrusted input.
Impact follows when the model changes behaviour, exposes hidden instructions, or triggers tools and workflows that were never intended by the user.

DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.
LiteLLM PyPI package breach — LiteLLM PyPI supply chain attack, credentials stolen from users.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Prompt injection is an identity and authority problem, not just a model-safety problem. The article shows that the attack succeeds when systems let data, instructions, and decisions share one execution path. That means the core failure is not content moderation alone but the absence of a hard boundary around who or what can influence action. Practitioners should treat AI execution paths as governed identity surfaces, not passive text pipelines.

Runtime authority collapse is the real failure mode in tool-using AI. Once a model can call tools, memory, or downstream workflows, the security question becomes whether its output is advisory or executable. Lasso's examples show that attackers do not need to defeat the model globally if they can contaminate a single trusted context. The implication is that controls built for static prompts do not survive autonomous action paths.

Context isolation is the named control gap this topic exposes. Modern AI stacks often assume the system can infer intent safely from mixed content, but prompt injection breaks that assumption by hiding commands inside trusted data. This is a structural governance gap because the same context can carry both reference material and malicious instruction. The practitioner conclusion is that AI security must preserve trust boundaries across retrieval, memory, and action layers.

AI agent governance now overlaps with NHI governance by design. When an AI system can retrieve data, retain memory, and invoke tools, it behaves like a non-human identity with delegated authority. That makes prompt injection relevant to IAM, PAM, and lifecycle controls, not only application security. Security teams should stop treating model prompts as isolated text and start governing the identity behaviour embedded in the workflow.

ServiceNow AI's 42% bypass result is a warning about guardrail fragility. Even safety taxonomies and model-level filters can fail at a meaningful rate when attack content is woven into natural-language interactions. That does not prove all guardrails are ineffective, but it does show that static safety layers cannot be the only line of defence. Practitioners should assume that runtime observation and policy enforcement are required.

From our research:
1 in 4 organisations are already investing in dedicated NHI security capabilities, with an additional 60% planning to do so within the next twelve months, according to The State of Non-Human Identity Security.
Only 1.5 out of 10 organisations are highly confident in their ability to secure NHIs, compared to nearly 1 in 4 for securing human identities.
That confidence gap matters here because Top 10 NHI Issues shows how unmanaged machine identities, over-privilege, and weak monitoring create the conditions prompt injection can exploit.

What this signals

Context isolation will become a design requirement rather than a security enhancement as more enterprise AI systems connect retrieval, memory, and tools. Teams that keep treating prompt text as a single trust domain will struggle to contain instruction drift once the model begins acting on external content.

The operational signal to watch is whether AI workflows still have a clear approval boundary before sensitive actions are executed. Once model outputs can trigger access, changes, or exfiltration paths, the programme needs controls that look more like identity governance and privileged access control than traditional application filtering.

For practitioners

Separate instruction and data channels Redesign prompts so system instructions, user input, retrieved content, and memory are isolated and cannot silently modify one another. This is the cleanest way to reduce hidden-instruction abuse in chat and RAG workflows.
Validate provenance before retrieval is trusted Tag retrieved content by source, age, and trust level, then block untrusted material from influencing operational decisions, tool calls, or policy-sensitive responses.
Require verification before model-driven actions Treat model output as advisory until a second control confirms the action, especially for data access, workflow changes, and privileged tool execution.
Monitor full request-to-response chains Capture prompts, retrieved context, system instructions, tool calls, and outputs in one audit trail so intent shifts can be reconstructed after an incident.

Key takeaways

Prompt injection works because AI systems often collapse data, instructions, and intent into one execution path.
The scale of the risk is already visible in bypass testing, zero-click exfiltration, and tool-abuse cases that bypass traditional controls.
Security teams need context isolation, provenance checks, and runtime action verification before they can claim meaningful control.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Prompt injection maps directly to agent instruction and tool abuse risk.
OWASP Non-Human Identity Top 10	NHI-05	Trusted AI workflows behave like delegated non-human identities with privileged actions.
NIST CSF 2.0	PR.AC-4	This article centers on preventing excessive authority in AI execution paths.

Limit AI system authority so outputs cannot become actions without explicit verification.

Key terms

Prompt Injection: Prompt injection is an attack that places malicious instructions into text a model is expected to trust, such as a prompt, document, email, or retrieved record. The risk is that the model follows those instructions as part of normal reasoning, which can change outputs, expose hidden context, or trigger actions.
Context Isolation: Context isolation is the practice of keeping user input, system instructions, retrieved content, and memory separate so one cannot silently alter the other. In AI security, it is a core control because mixed context lets untrusted text inherit authority it was never meant to have.
Tool-Calling Agent: A tool-calling agent is an AI system that can invoke external functions, services, or workflows as part of its operation. Once tools are available, prompt injection can become an execution problem, because the model’s output may directly influence actions with real operational impact.
Retrieval-Augmented Generation: Retrieval-augmented generation is an architecture where a model pulls external content into its working context before generating an answer. It improves relevance, but it also creates trust-boundary risk if the retrieved material is treated as instruction rather than reference material.

Deepen your knowledge

Prompt injection, context isolation, and AI runtime governance are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building controls for AI systems that can retrieve data and invoke tools, this course is a strong fit.

This post draws on content published by Lasso Security: Prompt Injection Examples That Expose Real AI Security Risks. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-03-30.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org