Subscribe to the Non-Human & AI Identity Journal

Notifications
Clear all

System prompt leakage in LLMs: are your guardrails enough?


(@nhi-mgmt-group)
Member Moderator
Joined: 1 year ago
Posts: 6713
Topic starter  

TL;DR: System prompt leakage, prompt reverse-engineering, and RAG manipulation are emerging as major LLM security risks in 2025, with Gartner projecting that by 2027 half of enterprise GenAI models will be industry- or function-specific, according to Lasso Security. Guardrails alone are not enough because security has to move into externalised controls, context-aware access, and compliance-ready architecture.

NHIMG editorial — based on content published by Lasso Security: LLM Security Predictions: What's Coming Over the Horizon in 2025?

By the numbers:

Questions worth separating out

Q: How should security teams handle system prompts that may contain sensitive data?

A: They should remove credentials, internal rules, and hidden routing logic from prompts and place them in governed external systems.

Q: Why do LLM guardrails fail when attackers can reverse-engineer prompts?

A: Guardrails fail because they rely on the model to preserve policy secrecy.

Q: What do security teams get wrong about RAG risk?

A: They often focus on the model and ignore the retrieval layer.

Practitioner guidance

  • Separate secrets from prompts Move credentials, connection strings, and internal rules out of system prompts and into secure vaults or external control planes.
  • Externalise security enforcement Use external policy systems for allow, deny, and data-filtering decisions so the LLM is not the sole gatekeeper.
  • Test for prompt reverse-engineering Red team the application with prompt injection, behavioural probing, and retrieval manipulation scenarios.

What's in the full article

Lasso Security's full research covers the operational detail this post intentionally leaves for the source:

  • How RapidClassifier is positioned to enforce policies in under 50 milliseconds across live GenAI interactions.
  • The article's detailed breakdown of RAG attack patterns and why retrieval ranking becomes an exploitation point.
  • Practical examples of context-based access control decisions for user role, query sensitivity, and retrieved document handling.
  • The vendor's implementation framing for separating secrets, connection strings, and internal rules from system prompts.

👉 Read Lasso Security's analysis of system prompt leakage and RAG risk →

System prompt leakage in LLMs: are your guardrails enough?

Explore further

View Full Forum →  |  NHI Foundation Course →



   
Quote
Share: