System prompt leakage is the next LLM security fault line

By NHI Mgmt Group Editorial TeamPublished 2025-09-29Domain: Agentic AI & NHIsSource: Lasso Security

TL;DR: System prompt leakage, prompt reverse-engineering, and RAG manipulation are emerging as major LLM security risks in 2025, with Gartner projecting that by 2027 half of enterprise GenAI models will be industry- or function-specific, according to Lasso Security. Guardrails alone are not enough because security has to move into externalised controls, context-aware access, and compliance-ready architecture.

At a glance

What this is: Lasso Security argues that system prompt leakage, RAG abuse, and context-aware access are becoming central LLM security issues as enterprise GenAI adoption expands.

Why it matters: This matters because LLM governance now touches NHI, agentic AI, and human access patterns at the same time, and IAM teams need controls that survive prompt leakage, tool misuse, and data exposure.

By the numbers:

Gartner predicts that by 2027, half of GenAI models that enterprises use will be designed for specific industries or business functions.
One study shows that most RAG attacks settle around a 40% success rate, which can rise to 60% if ambiguous answers are counted as successful attacks.
Lasso Security says its RapidClassifier can run custom security policies in under 50 milliseconds.

👉 Read Lasso Security's analysis of system prompt leakage and RAG risk

Context

System prompt leakage is not just a content disclosure issue. In LLM environments, the system prompt often carries behaviour instructions, hidden routing logic, and sometimes sensitive operational detail, which means exposure can reveal both the model's guardrails and the structure of the application around it. For IAM and security teams, that turns a prompt into a governance boundary, not just a configuration field.

The article frames LLM security as a control problem that extends beyond the model itself. As domain-specific LLM agents move into enterprise workflows, the real question becomes which controls remain enforceable when prompts can be reverse-engineered, retrieved content can be manipulated, and sensitive data may be embedded in places that were never meant to hold it.

Key questions

Q: How should security teams handle system prompts that may contain sensitive data?

A: They should remove credentials, internal rules, and hidden routing logic from prompts and place them in governed external systems. A prompt should steer behaviour, not act as a secrets store. Security teams should also assume prompts can be inferred through model responses, so prompt content needs the same review discipline as other sensitive control logic.

Q: Why do LLM guardrails fail when attackers can reverse-engineer prompts?

A: Guardrails fail because they rely on the model to preserve policy secrecy. If an attacker can infer the prompt through output patterns, the hidden logic is no longer hidden, and the model's apparent enforcement can be bypassed or manipulated. That is why enforcement must live outside the model whenever the prompt contains sensitive instructions.

Q: What do security teams get wrong about RAG risk?

A: They often focus on the model and ignore the retrieval layer. RAG attacks succeed when malicious or over-sensitive content is allowed into context, so the real control point is document selection, ranking, and sensitivity filtering before generation starts. Static access rules alone do not capture that interaction risk.

Q: How can teams decide whether to use context-based access control for GenAI?

A: Use it when the risk depends on what the model can retrieve or return, not just who the user is. If the same identity can produce different exposure outcomes based on query content, document sensitivity, or model response, then context-based access control is the better fit. It aligns control decisions with the actual interaction boundary.

Technical breakdown

Why system prompts become a security boundary

System prompts define how an LLM behaves, what it should refuse, and how it should route requests. When organisations place credentials, internal rules, or hidden instructions into that layer, the prompt becomes both policy and exposure surface. Attackers do not need direct prompt access to recover value, because model behaviour can reveal enough to infer the underlying structure. That creates a security problem that sits between application design and identity governance, especially when the LLM is trusted to mediate access to data or actions.

Practical implication: keep secrets and policy enforcement outside the prompt and treat prompt content as inspectable security material.

RAG manipulation and context-based access control

Retrieval-augmented generation adds a separate attack surface because the model depends on retrieved documents as context. If malicious content is ranked highly or injected into retrieval, the model can be steered toward unsafe or inaccurate answers without changing the base model. Context-based access control limits this by evaluating the request, the response, and the sensitivity of the retrieved material together, rather than relying only on static user roles. That is a better fit for systems where access risk depends on the interaction context, not just the user identity.

Practical implication: apply context-aware enforcement to retrieval pipelines so sensitive documents are filtered before the model consumes them.

Why latency matters in LLM security architecture

Security controls that sit in the path of an AI application fail if they are too slow for real-time use. The article's emphasis on sub-50-millisecond policy execution reflects a core architecture lesson: prevention must keep pace with user interaction or teams will bypass it. In practice, this shifts the design target from heavyweight inspection to inline, low-latency enforcement that can block unsafe behaviour without degrading the application experience. That is especially relevant when the LLM is embedded in business workflows where delay creates pressure to weaken controls.

Practical implication: design inline enforcement for speed first, or the business will route around the control.

Threat narrative

Attacker objective: The attacker wants to convert model trust into unauthorised access to data, instructions, or downstream actions.

Entry begins when an attacker uses prompt injection, reverse-engineering, or retrieval manipulation to influence the model's behaviour and expose hidden instructions.
Escalation occurs when the attacker extracts credentials, bypasses guardrails, or turns the model into a path to sensitive context and downstream systems.
Impact follows when manipulated responses or exposed secrets create unauthorised access, data leakage, or security control bypass across the GenAI application.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

System prompt leakage is a governance failure, not just a prompt hygiene issue. The prompt has become a policy container, a routing layer, and in some cases a repository for sensitive material. Once attackers can infer or extract that content, they are not merely reading text, they are learning how the application makes trust decisions. The implication is that prompt content must be treated as governed control logic, not as informal model configuration.

Context-based access control is the right response to retrieval risk because static roles cannot express contextual exposure. RAG systems are dynamic by design, so the governing question is not who the user is in the abstract, but what the model is being allowed to see and return in this specific interaction. That is where ordinary RBAC stops being sufficient, and where security architecture has to account for retrieval sensitivity, query intent, and output risk. Practitioners should reframe RAG governance around contextual trust boundaries.

LLM security is converging with NHI governance because prompts, APIs, and service identities are now part of the same control surface. The article's most important signal is not that models are vulnerable, but that model behaviour, secrets exposure, and workload access can no longer be separated cleanly in enterprise design. That convergence makes identity lifecycle, secret handling, and runtime enforcement part of the same programme. The implication for practitioners is that AI security cannot sit apart from IAM and NHI governance.

Prompt isolation is becoming a named control gap, and organisations should treat it as such. System prompts that carry hidden rules or embedded credentials create a fragile trust boundary that attackers can reverse-engineer through behaviour, not just read directly. That failure mode is distinct from general data leakage because it exposes the application's security logic itself. Practitioners should recognise prompt isolation as a discrete governance requirement, not an optional hardening step.

From our research:
Only 20% have formal processes for offboarding and revoking API keys, and even fewer have procedures for rotating them, according to the Ultimate Guide to NHIs.
91.6% of secrets remain valid five days after the targeted organisation is notified, showing a critical gap in remediation procedures.
For a deeper governance lens: The 52 NHI breaches Report shows how secrets and standing access turn into repeatable exposure patterns across real incidents.

What this signals

Prompt isolation: the next maturity step is separating model behaviour guidance from governed access logic, because once prompts can be inferred, hidden policy becomes an attack surface. Teams that already manage NHI secrets and service account lifecycle should extend the same discipline to LLM control layers, using the governance patterns described in the Ultimate Guide to NHIs.

The practical signal for programmes is that GenAI risk will not be solved by model-side guardrails alone. When retrieval, identity, and policy live in separate layers, teams can instrument each layer independently and avoid turning the model into a single point of control failure.

As domain-specific AI adoption grows, security leaders should expect prompt leakage and RAG manipulation to surface as recurring operational issues rather than edge cases. The governance response should be to align AI application controls with the same lifecycle discipline used for secrets, workload identities, and third-party access.

For practitioners

Separate secrets from prompts Move credentials, connection strings, and internal rules out of system prompts and into secure vaults or external control planes. If the model needs policy input, pass only the minimum necessary context at runtime.
Externalise security enforcement Use external policy systems for allow, deny, and data-filtering decisions so the LLM is not the sole gatekeeper. This reduces the chance that prompt leakage also becomes a control bypass.
Test for prompt reverse-engineering Red team the application with prompt injection, behavioural probing, and retrieval manipulation scenarios. Include cases where the attacker never sees the prompt directly but can still infer it from outputs.
Review RAG sensitivity boundaries Classify retrieved content by sensitivity and business impact, then enforce context-based access control before the model consumes it. Keep high-risk documents out of the retrieval path unless the request context genuinely warrants them.

Key takeaways

System prompts can become a control boundary and a data exposure surface at the same time, which makes prompt leakage a governance issue as much as a model issue.
RAG attacks work because retrieval context can be manipulated, so static identity checks alone do not protect the generation path.
Teams need external enforcement, secrets isolation, and context-aware access control if they want GenAI security that can survive real attacker behaviour.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Covers prompt injection, tool misuse, and agent behaviour risks in GenAI applications.
OWASP Non-Human Identity Top 10	NHI-03	Sensitive prompts and embedded secrets create NHI exposure and rotation problems.
NIST CSF 2.0	PR.AC-4	Context-aware access decisions align with least-privilege enforcement in GenAI systems.

Implement contextual access checks before retrieval and generation to limit exposure at the control point.

Key terms

System prompt: The system prompt is the instruction layer that defines how an LLM should behave, refuse requests, and prioritise rules. In practice, it can also become a sensitive control surface if teams place secrets, policy logic, or hidden routing into it. That makes governance and separation of duties essential.
Retrieval-augmented generation: Retrieval-augmented generation, or RAG, is a pattern where an LLM consults external documents or indexes before answering. It improves relevance, but it also creates a new attack surface if malicious, stale, or overly sensitive content is allowed into the retrieval context.
Context-based access control: Context-based access control evaluates the request, the response, and the data context before allowing access or disclosure. Unlike static role checks, it adapts to sensitivity, intent, and interaction state, which makes it better suited to GenAI systems where the risk changes inside the session.
Prompt injection: Prompt injection is the attempt to manipulate an LLM into following attacker-controlled instructions instead of intended policy. It can be direct or indirect, and it becomes more dangerous when the model is trusted to mediate access to data, tools, or decisions without external enforcement.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or governance maturity, it is worth exploring.

This post draws on content published by Lasso Security: LLM Security Predictions: What's Coming Over the Horizon in 2025? Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-09-29.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org