LLM application security needs runtime controls, not guardrails alone

By NHI Mgmt Group Editorial TeamPublished 2025-06-03Domain: Agentic AI & NHIsSource: Noma Security

TL;DR: As GenAI moves into production apps, built-in model guardrails and cloud-provider protections lag behind prompt injection, jailbreaks, and data leakage risks, according to Noma Security. The operational gap is no longer theoretical: teams need runtime detection, response, and containment for LLM-driven workflows.

At a glance

What this is: This is an editorial analysis of why LLM application security now needs runtime controls because model guardrails alone cannot keep pace with evolving prompt manipulation and leakage threats.

Why it matters: For IAM and NHI practitioners, LLMs are becoming privileged software actors, so security must govern their inputs, outputs, and connected tools as part of the access model.

👉 Read Noma Security's analysis of runtime protections for LLM application security

Context

LLM application security is the control problem created when generative models are embedded into production systems that handle sensitive data, user requests, and downstream actions. In those environments, the model is not just a chatbot. It becomes an execution layer that can be steered, abused, or leaked through prompts, which makes NHI governance part of the security design rather than an afterthought.

The article argues that model-built guardrails and cloud-provider controls are not enough because they react after threat patterns evolve. That is the right framing for IAM and NHI teams: the question is not whether an LLM can be restricted, but whether the surrounding runtime can detect unsafe behavior fast enough to contain impact before data, pricing, or internal workflows are exposed.

Key questions

Q: How should security teams handle prompt injection in production LLM applications?

A: Security teams should treat prompt injection as a runtime control issue, not a content-moderation problem. The practical response is to inspect prompts, retrieved content, and tool outputs for hostile instructions, then block or downgrade unsafe sessions before the model can act on them. The model should never be the only enforcement layer.

Q: What is the difference between model guardrails and runtime AI security controls?

A: Model guardrails are built into the model or provider layer and usually reflect known abuse patterns. Runtime AI security controls sit in the application path and can inspect, mask, block, or reroute risky interactions in real time. For production systems, runtime controls are the layer that reduces blast radius when the model is already in use.

Q: Why do LLM applications create new data leakage risks for identity teams?

A: LLM applications can expose sensitive data when users paste secrets, when agents retrieve privileged context, or when responses echo internal material back to users. That creates an identity problem because the model may handle information it was never meant to disclose. Teams should govern what the model can see and what it can return.

Q: When should organisations add runtime controls to AI applications?

A: Organisations should add runtime controls before production rollout, not after the first abuse case. Once an LLM is handling user input, internal data, or connected tools, it has already become part of the security boundary. Waiting for post-deployment tuning leaves a gap where prompts, outputs, and actions are ungoverned.

Technical breakdown

Why LLM prompt injection bypasses model guardrails

Prompt injection works by placing hostile instructions inside user input, retrieved content, or tool output so the model treats them as higher-priority context. Jailbreaks, crescendo-style escalation, GCG, and refusal suppression all try to override the model’s intended policy boundary. The core problem is that model behavior is probabilistic, while security policy needs deterministic enforcement. If the model itself is the only control point, attackers can still manipulate outputs, trigger unsafe actions, or surface hidden system details. Runtime inspection is therefore a control layer, not a feature add-on.

Practical implication: Treat prompt-injection defense as a runtime control problem and monitor every LLM input path, not just the final model response.

How sensitive data leakage happens in LLM workflows

Data leakage in LLM systems can happen when users paste confidential material into prompts, when agents retrieve internal context, or when the model echoes structured secrets back into responses. Unlike classic application flaws, the exposure may be conversational and accidental, which makes it harder to detect with perimeter tools. The article’s point is that anonymization and masking must happen before data leaves the trust boundary and before responses are returned. This is especially relevant where NHI secrets, internal notes, API keys, or backend details can appear in model context or connected tool outputs.

Practical implication: Enforce pre-prompt and pre-response filtering for secrets, tokens, and sensitive business data in every LLM workflow.

Why runtime detection matters more than static guardrails

Static guardrails are trained or configured against yesterday’s known abuse patterns, but LLM attack techniques evolve quickly. Runtime detection and adaptive response reduce the time between a new prompt pattern appearing and the control reacting to it. In practice, that means using policy enforcement, topic restrictions, anomaly detection, and response shaping together. This is similar to why EDR became necessary alongside endpoint hardening: prevention alone does not cover active abuse. For NHI governance, the same lesson applies to AI agents and model-mediated access paths.

Practical implication: Adopt layered runtime controls that can classify, block, mask, or downgrade risky LLM interactions as they occur.

Threat narrative

Attacker objective: The attacker’s objective is to steer the LLM into exposing data, bypassing controls, or executing actions that create operational, financial, or access-related harm.

Entry occurs when an attacker supplies crafted prompts, malicious retrieved text, or manipulated conversation context to a production LLM interface.
Escalation happens when the model follows the injected instructions instead of the intended policy, revealing internal information or generating unsafe actions.
Impact appears when the LLM leaks sensitive data, authorizes unintended business logic, or becomes a path into connected systems and workflows.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

LLM application security is now a runtime governance problem, not a model-tuning problem. The article correctly separates built-in guardrails from operational controls, and that distinction matters. Model updates lag attacker creativity, while production systems need immediate enforcement around prompts, outputs, and connected tools. The practical conclusion is that teams must govern the runtime path, not assume the model will self-defend.

Prompt injection creates an identity boundary problem for AI systems. Once a model can be steered into revealing context or issuing actions, the issue is no longer only content safety. It becomes an access and trust problem because the model is effectively operating with delegated authority. That means NHI governance has to include what the model can see, what it can pass along, and what it can trigger.

Topic guardrails are useful only when they are tied to explicit policy boundaries. Preventing off-topic or malicious conversations is valuable, but policy has to be encoded as enforceable runtime decisions, not just moderation intent. Without that linkage, the control is easy to bypass through context shifting, indirect prompts, or tool chaining. Practitioners should treat topic enforcement as a security policy, not a user-experience filter.

LLM security vendors are converging on the same core control model: detect, classify, and respond in real time. That tells the market where this category is heading. Buyers should expect more emphasis on latency, policy precision, and integration with broader identity and data controls rather than on model-layer assurances alone. The governance question is whether the control sits close enough to the transaction to matter.

AI runtime controls will increasingly be judged by how well they reduce blast radius. The article’s strongest signal is that organizations cannot rely on static protection for a dynamic attack surface. The field is moving toward controls that can mask sensitive content, interrupt unsafe sessions, and preserve operational speed at the same time. Practitioners should prioritize containment over cosmetic safety claims.

From our research:
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant behaviour gap, according to The State of Secrets in AppSec.
For the adjacent risk of AI credential abuse, attackers attempt access to exposed AWS credentials within an average of 17 minutes, which makes runtime containment a governance priority, according to LLMjacking: How Attackers Hijack AI Using Compromised NHIs.

What this signals

Ephemeral model interactions do not eliminate identity risk. When LLMs sit inside production workflows, the security issue shifts from whether a model is safe in isolation to whether the surrounding controls can contain what it can see and do. The control design needs to reflect that reality, especially when runtime abuse can move faster than remediation cycles. The gap is structural, not incidental.

With 6 distinct secrets manager instances on average across organisations, fragmentation is already weakening central visibility, and LLM workflows can widen that gap if secret handling is not centralised. That makes policy consistency harder at exactly the moment when prompt-driven systems need tighter oversight. Teams should assume that distributed control increases exposure unless governance is deliberately simplified.

The next phase of AI security will reward organisations that can connect runtime detection to identity and data controls in the same workflow. That means aligning LLM policy enforcement with secrets handling, access review, and incident response rather than treating AI protection as a standalone layer. Practitioners should prepare for more integrated control stacks, not more isolated tools.

For practitioners

Map LLM trust boundaries before deployment Document every place where prompts, retrieved content, tool outputs, and model responses cross a sensitive-data boundary. Identify where secrets, internal notes, or API references could be surfaced, then apply filtering and approval controls at those boundaries first.
Deploy runtime prompt-injection detection Use controls that inspect live conversations for jailbreak patterns, indirect instructions, and policy bypass attempts. Tune them to classify, block, or downgrade risky sessions instead of only logging them after the fact.
Mask secrets before model exposure Add pre-processing to redact credentials, tokens, certificates, and internal identifiers before they reach the model or leave the response path. This reduces accidental leakage and limits how much privileged context an attacker can harvest.
Tie topic restrictions to enforceable policy Define what the LLM is allowed to discuss, what it must refuse, and when it must hand off to a human or a safer workflow. Make those rules measurable so they can be audited as part of access governance.

Key takeaways

LLM application security fails when teams treat model guardrails as the full control stack instead of a starting point.
Runtime inspection and response matter because prompt injection, data leakage, and unsafe tool use happen after deployment, not in theory.
For NHI and IAM teams, the decisive question is whether the runtime can limit what the model can see, say, and trigger.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	NHI-01	Prompt injection and tool misuse map directly to agentic AI abuse paths.
NIST AI RMF		AI governance and continuous monitoring fit the article's runtime control thesis.
NIST CSF 2.0	PR.AC-4	Least-privilege access is critical when LLMs can reveal or trigger sensitive workflows.

Inspect LLM inputs and tool calls for injection patterns and block unsafe agent actions in real time.

Key terms

LLM Application Security: LLM application security is the practice of protecting systems that embed large language models into production workflows. It focuses on preventing prompt abuse, data leakage, unsafe outputs, and misuse of connected tools so the model does not become a control bypass for the broader application.
Prompt Injection: Prompt injection is the act of placing hostile instructions into user input, retrieved content, or tool output so a model follows the attacker’s intent instead of the system’s policy. It is a runtime abuse technique that targets the model’s context handling rather than the underlying code directly.
AI Runtime Security: AI runtime security is the set of controls that inspect, constrain, and respond to model behavior while the application is live. It includes detection, masking, policy enforcement, and response shaping, all aimed at reducing the blast radius of unsafe model interactions.

Deepen your knowledge

LLM application security and runtime control design are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are governing AI-enabled workflows with delegated access, this course helps translate the risk into practical controls.

This post draws on content published by Noma Security: LLM application security and AI Detection and Response. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-06-03.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org