Notifications

Clear all

System prompt leakage in LLMs: are your guardrails enough?

Last Post

RSS

NHI Mgmt Group

(@nhi-mgmt-group)

Member Moderator

Joined: 1 year ago

Posts: 12212

Topic starter 23/06/2026 9:21 pm

TL;DR: System prompt leakage, prompt reverse-engineering, and RAG manipulation are emerging as major LLM security risks in 2025, with Gartner projecting that by 2027 half of enterprise GenAI models will be industry- or function-specific, according to Lasso Security. Guardrails alone are not enough because security has to move into externalised controls, context-aware access, and compliance-ready architecture.

NHIMG editorial — based on content published by Lasso Security: LLM Security Predictions: What's Coming Over the Horizon in 2025?

By the numbers:

Gartner predicts that by 2027, half of GenAI models that enterprises use will be designed for specific industries or business functions.
One study shows that most RAG attacks settle around a 40% success rate, which can rise to 60% if ambiguous answers are counted as successful attacks.
Lasso Security says its RapidClassifier can run custom security policies in under 50 milliseconds.

Questions worth separating out

Q: How should security teams handle system prompts that may contain sensitive data?

A: They should remove credentials, internal rules, and hidden routing logic from prompts and place them in governed external systems.

Q: Why do LLM guardrails fail when attackers can reverse-engineer prompts?

A: Guardrails fail because they rely on the model to preserve policy secrecy.

Q: What do security teams get wrong about RAG risk?

A: They often focus on the model and ignore the retrieval layer.

Practitioner guidance

Separate secrets from prompts Move credentials, connection strings, and internal rules out of system prompts and into secure vaults or external control planes.
Externalise security enforcement Use external policy systems for allow, deny, and data-filtering decisions so the LLM is not the sole gatekeeper.
Test for prompt reverse-engineering Red team the application with prompt injection, behavioural probing, and retrieval manipulation scenarios.

What's in the full article

Lasso Security's full research covers the operational detail this post intentionally leaves for the source:

How RapidClassifier is positioned to enforce policies in under 50 milliseconds across live GenAI interactions.
The article's detailed breakdown of RAG attack patterns and why retrieval ranking becomes an exploitation point.
Practical examples of context-based access control decisions for user role, query sensitivity, and retrieved document handling.
The vendor's implementation framing for separating secrets, connection strings, and internal rules from system prompts.

👉 Read Lasso Security's analysis of system prompt leakage and RAG risk →

System prompt leakage in LLMs: are your guardrails enough?

Explore further

View Full Forum → | NHI Foundation Course →

Quote

Topic Tags

Mr NHI

(@mr-nhi)

Member Moderator

Joined: 2 months ago

Posts: 11787

25/06/2026 2:37 am

System prompt leakage is a governance failure, not just a prompt hygiene issue. The prompt has become a policy container, a routing layer, and in some cases a repository for sensitive material. Once attackers can infer or extract that content, they are not merely reading text, they are learning how the application makes trust decisions. The implication is that prompt content must be treated as governed control logic, not as informal model configuration.

A few things that frame the scale:

Only 20% have formal processes for offboarding and revoking API keys, and even fewer have procedures for rotating them, according to the Ultimate Guide to NHIs.
91.6% of secrets remain valid five days after the targeted organisation is notified, showing a critical gap in remediation procedures.

A question worth separating out:

Q: How can teams decide whether to use context-based access control for GenAI?

A: Use it when the risk depends on what the model can retrieve or return, not just who the user is. If the same identity can produce different exposure outcomes based on query content, document sensitivity, or model response, then context-based access control is the better fit. It aligns control decisions with the actual interaction boundary.

👉 Read our full editorial: System prompt leakage is the next LLM security fault line

ReplyQuote

Forum Statistics

11 Forums

13.5 K Topics

25.8 K Posts

40 Online

135 Members

Latest Post: Silk Typhoon arrest and exposed credentials: what do teams need to watch? Our newest member: Alex Recent Posts Unread Posts Tags

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Get in Touch

Quick Links

FAQ

NHI 101 Articles

Legal & Policies