AI without guardrails expands attack paths for LLM abuse

By NHI Mgmt Group Editorial TeamPublished 2024-09-13Domain: Agentic AI & NHIsSource: CyberArk

TL;DR: LLM abuse now spans prompt injection, prompt leaking, and jailbreaking, while offensive-capable models lower the barrier to generating malicious code and attack guidance, according to CyberArk. The governance gap is no longer theoretical: identity security and layered controls now matter as much as model safety.

At a glance

What this is: This is an independent analysis of how unguarded LLMs and offensive AI services expand misuse paths, with prompt injection, prompt leaking, and jailbreaking as the core techniques.

Why it matters: IAM and NHI teams need to treat AI services as identity-bearing systems whose access, output, and tool use can be abused without guardrails.

👉 Read CyberArk's analysis of LLM abuse without guardrails

Context

AI without guardrails is not just a model-safety issue. It is an identity and access problem because an LLM that can answer unrestricted prompts, expose sensitive guidance, or drive tools becomes a high-risk execution surface inside the enterprise. For NHI practitioners, the question is not whether the model is intelligent, but whether its access is bounded, attributable, and revocable.

The article describes a shift from prompt abuse against mainstream chatbots to services explicitly designed for offensive research. That changes the governance baseline because unmanaged AI services can be adopted outside approved control planes, creating shadow AI behaviors with no consistent policy, audit trail, or lifecycle management. The starting assumption in the article is increasingly common: AI capabilities are being treated as openly usable before they are being treated as governable.

Key questions

Q: How should security teams govern AI services that can generate offensive content?

A: Security teams should govern offensive-capable AI services like any other high-risk non-human identity. That means explicit ownership, least privilege, approved use cases, logging of prompts and outputs, and a ban on production influence unless the service is separately reviewed and constrained. If the service can guide attacks, it needs policy enforcement outside the prompt.

Q: Why do LLM jailbreaks create an IAM problem?

A: Jailbreaks matter to IAM because they show that model behavior can be manipulated after authentication. If a user can override safeguards once inside the service, then login alone does not prove safe intent. Teams need authorization controls, usage policy, and monitoring that limit what the model can do even after access is granted.

Q: What is the difference between prompt injection and prompt leaking?

A: Prompt injection tries to change what the model does by hiding malicious instructions in the input. Prompt leaking tries to reveal hidden prompts, examples, or internal instructions that shape the model’s behavior. Both are governance problems because they can cause the model to expose sensitive context or produce unintended outputs.

Q: Should organisations allow AI tools that can generate attack code?

A: Only if the tool is tightly scoped to an approved security function and separated from general enterprise use. Otherwise, the tool increases the speed and accessibility of offensive tradecraft, which raises misuse risk faster than most teams can monitor it. Strong identity controls, logging, and restricted access are mandatory before any deployment.

Technical breakdown

Prompt injection, prompt leaking, and jailbreaking as control bypasses

These three techniques attack different layers of LLM behavior. Prompt injection disguises malicious instructions inside apparently legitimate input so the model follows the attacker’s intent. Prompt leaking aims to coax out system prompts, examples, or hidden instructions that shape the model’s behavior. Jailbreaking uses structured conversation to override safeguards and remove safety boundaries. In practice, the control failure is not just content moderation. It is the absence of a durable policy layer that can separate approved user intent from adversarial instruction when the model is acting on behalf of a workflow.

Practical implication: Treat model inputs as untrusted and enforce policy outside the prompt whenever the model can influence decisions or actions.

Why offensive-capable LLMs change the NHI risk model

An offensive-capable LLM is not simply a chat interface with different content limits. It is a service that compresses attacker research, code generation, and operational guidance into a low-friction workflow. That matters for NHI governance because the service itself may be reachable through ordinary identity providers such as social logins, while the output can accelerate phishing, malware development, or privilege abuse. The risk is less about one dangerous answer and more about scale. The model can repeatedly generate usable attacker artifacts without requiring bespoke expertise from the operator.

Practical implication: Review AI services as potential NHI-adjacent assets and apply the same access, monitoring, and approval discipline used for sensitive automation systems.

Identity security as the first control plane for AI misuse

The article’s strongest operational point is that model safety alone is insufficient. Identity security must anchor the defense because the enterprise needs to know who is using an AI service, what it can access, and whether its outputs can trigger downstream tools. That aligns with Zero Trust thinking: continuously verify, minimize privilege, and assume the service or user may be misused. For AI systems that can generate code, instructions, or actions, authentication without authorization boundaries creates a false sense of control.

Practical implication: Bind AI access to least privilege, logging, and reviewable approvals before allowing any model to influence production or security workflows.

Threat narrative

Attacker objective: The attacker wants scalable, low-skill access to offensive guidance and code generation that reduces the effort needed to execute real attacks.

Entry occurs when a user reaches an unrestricted or offensive-capable LLM through ordinary login methods and begins querying for attack guidance.
Escalation happens when prompt injection, prompt leaking, or jailbreaking removes guardrails and reveals malicious instructions, code, or operational steps.
Impact follows when the generated content is reused to accelerate phishing, malware development, access abuse, or other attack activity.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI guardrails are now an access-control problem, not only a safety problem. Once an LLM can generate prohibited guidance, the issue is no longer limited to content moderation. The enterprise must decide who can invoke the model, what context it can see, and whether its outputs can affect real systems. For NHI governance, that makes policy enforcement outside the prompt the only defensible design.

Offensive-capable AI services create a new shadow AI category. A service that is easy to reach through standard identity providers can still sit outside approved governance if no one owns its approval, logging, or lifecycle. That leaves security teams with AI use that is visible to attackers before it is visible to defenders. Practitioners should treat this as unmanaged execution authority, not as harmless experimentation.

Identity security is the control layer that makes AI defensible. The article correctly centers authenticity and integrity because AI systems cannot be governed by trust in model behavior alone. Strong identity proof, least privilege, and auditability determine whether the model is a bounded assistant or an uncontrolled attack multiplier. Practitioners should place AI services inside the same governance model used for other sensitive non-human identities.

Zero Trust only works here if the model is continuously verified and constrained. A login screen is not a control boundary when the service can generate actionable offensive output on demand. Continuous verification, scoped authorization, and telemetry are necessary if AI is allowed anywhere near code, security tooling, or sensitive data. The practical conclusion is simple: if you cannot constrain the AI, you cannot safely operationalize it.

From our research:
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
That governance gap is why teams should pair AI access controls with the broader NHI guidance in Top 10 NHI Issues before expanding deployment.

What this signals

Prompt safety will not be enough for enterprise AI programmes. The practical boundary is identity, authorization, and auditability, especially when a model can generate code or operational instructions. As AI services spread through everyday login paths, teams need to treat them as governed execution surfaces rather than isolated applications.

AI abuse creates a new form of trust debt. Once employees or attackers can reach offensive-capable models through ordinary accounts, the organisation inherits risk that cannot be erased by content filters alone. The programme response should align with OWASP Agentic AI Top 10 and Zero Trust thinking, because the model must be verified and constrained before it is trusted.

The governance signal is structural: 96% of technology professionals identify AI agents as a growing security threat and 66% say the risk is immediate, according to AI Agents: The New Attack Surface report. That scale means security teams should plan for policy, telemetry, and approval workflows now, not after misuse becomes routine.

For practitioners

Classify AI services as governed identities Inventory every LLM and AI service in use, including tools reached through Google or GitHub login, and assign an owner, purpose, and approval status. Record whether the service can access sensitive data, generate code, or trigger downstream actions, then review it in the same governance cycle as other non-human identities.
Block unrestricted offensive-use cases Define explicit policy for attack generation, jailbreak testing, malware assistance, and other prohibited uses, then enforce it with access rules and monitoring outside the prompt. Where the service is designed for red team work, restrict it to approved users, approved environments, and documented use cases.
Apply least privilege to AI influence paths Separate read-only model access from any path that can change systems, create tickets, run code, or access secrets. Require step-up approval for tool invocation, and log the model input, the action requested, and the identity that authorized it.
Measure whether AI outputs can become attack inputs Test how easily a model can be pushed into producing malicious code, hidden instructions, or data extraction guidance. Use those findings to set alerting thresholds, user restrictions, and escalation paths for abuse cases that are more operational than theoretical.

Key takeaways

Unguided LLMs turn prompt abuse into a governance problem because model access can produce real offensive output, not just unsafe text.
AI services that are easy to reach through standard logins still require ownership, least privilege, and audit trails when they influence code or security workflows.
Identity security, not prompt safety alone, is the control plane that determines whether AI becomes a bounded helper or an attack multiplier.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	NHI-07	Prompt injection and jailbreaking map to agentic AI misuse and tool abuse.
NIST AI RMF		The article centers governance, accountability, and trust boundaries for AI systems.
NIST Zero Trust (SP 800-207)	PR.AC-4	Continuous verification and least privilege are needed when AI can influence actions.

Apply least privilege and continuous authorization before allowing AI to access tools or data.

Key terms

Prompt Injection: Prompt injection is an attack that inserts malicious instructions into model input so the AI follows the attacker instead of the intended user. It is a control-bypass problem, not just a content problem, because the model may reveal data or perform actions after being steered off policy.
Prompt Leaking: Prompt leaking is the extraction of hidden instructions, examples, or system context that shapes an LLM’s behavior. Security teams care because leaked context can reveal guardrails, internal logic, or sensitive data paths that help an attacker refine later abuse or impersonation attempts.
Jailbreaking: Jailbreaking is the practice of crafting prompts that persuade an AI model to ignore its safeguards and produce restricted outputs. It shows that authentication to the service does not guarantee safe behavior, which is why governance must extend beyond the chat interface.
Shadow AI: Shadow AI is the use of AI services or agents outside approved enterprise governance. It often appears benign at first, but it creates blind spots in ownership, logging, data access, and policy enforcement, which makes incident response and compliance much harder.

Deepen your knowledge

AI guardrails and identity security for LLMs are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If your team is starting from an environment where AI services are already in use, it is worth exploring.

This post draws on content published by CyberArk: LLMs Gone Wild: AI Without Guardrails. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2024-09-13.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org