By NHI Mgmt Group Editorial TeamPublished 2026-04-20Domain: Agentic AI & NHIsSource: Lakera

TL;DR: The market is clustering around prompt injection, leakage, hallucination control, and red teaming, while acknowledging that AI systems now retrieve data, invoke tools, and act across enterprise workflows, according to Lakera’s overview of 12 LLM security tools. The practical lesson is that protecting model output is not the same as governing what an AI system is allowed to do.


At a glance

What this is: This overview maps 12 LLM security tools and finds the category is still centred on prompt, leakage, and output controls rather than full execution-layer governance.

Why it matters: IAM and security teams need to treat AI systems as governed identities, because tool use, data access, and action scope now matter as much as model quality or prompt safety.

By the numbers:

👉 Read Lakera's overview of 12 LLM security tools and agentic AI risk


Context

LLM security tools are usually judged on how well they block unsafe prompts, leakage, or malformed outputs, but that leaves the deeper governance question untouched: what can the system access and what can it do once access is granted? In enterprise environments, an LLM is rarely isolated. It sits inside a workflow that can read data, call tools, and trigger downstream actions, which means model safety and identity safety start to overlap.

That overlap matters for NHI governance, agentic AI oversight, and human IAM programmes alike. If a security stack only inspects text at the boundary, it can miss the execution path, the delegated permissions, and the audit trail that determine whether the system stayed within policy. The right comparison is not just which tool catches more malicious prompts, but which control points constrain the identity behind the model.

The article’s real value is as a market map, not a final control design. It shows a category that is maturing quickly around detection and policy enforcement, while the governance model practitioners actually need still has to be assembled across identity, secrets, runtime controls, and review processes.


Key questions

Q: How should security teams govern LLMs that can call tools and access data?

A: Treat the LLM as a governed identity, not just a model endpoint. Security teams should bind each workflow to a named service account or token, constrain tool scope, and require audit logs for data access and actions. Prompt filtering alone cannot control delegated execution, so the policy must live at the tool and identity layer.

Q: Why do prompt injection defences not solve AI security on their own?

A: Prompt injection defences reduce malicious steering, but they do not stop an authorised model from doing harmful things with connected tools or data. If the underlying identity has broad access, a successful prompt only needs to redirect existing privileges. Real control requires least privilege, telemetry, and execution-time policy enforcement.

Q: What do security teams get wrong about LLM monitoring?

A: They often monitor for bad prompts or unsafe outputs without watching the actions the model attempts to take. The more important signals are reachable tools, accessed datasets, and policy violations during execution. Monitoring has to prove whether the model stayed within its authorised boundary, not just whether it sounded safe.

Q: How do identity controls change when AI systems become part of enterprise workflows?

A: Identity controls must move from human-centric approval logic to machine-centric execution control. That means inventorying the identities behind AI workflows, limiting standing access, and reviewing lifecycle changes whenever a workflow, tool, or data source changes. The model’s behaviour matters less than the authority attached to it.


Technical breakdown

Prompt injection detection versus action control

Prompt injection tools look for malicious text that tries to steer model behaviour, but that only addresses one layer of risk. If the model can also call APIs, retrieve documents, or trigger workflows, then the security boundary moves from input screening to execution control. In practice, the weakness is not only what the prompt says, but whether the system has been authorised to act on it. That is why LLM security cannot stop at content filtering. It has to account for tool permissioning, downstream authorisation, and the separation between what the model can interpret and what it can execute.

Practical implication: align prompt filtering with least-privilege tool access and separate model evaluation from runtime authorisation.

Why leakage and hallucination controls are not enough

Leakage detection and hallucination monitoring help reduce obvious misuse, but they do not solve the trust problem created when a model sits inside a broader enterprise workflow. A model can produce a safe-looking answer and still trigger an unsafe operation through a connected tool. Likewise, a blocked output does not mean a delegated identity was properly constrained. The architectural lesson is that text safety, data safety, and action safety are related but distinct control layers. Treating them as one control plane creates blind spots in identity governance and incident investigation.

Practical implication: map controls separately for input, data handling, and execution so each failure mode has its own guardrail.

AI red teaming and monitoring as governance evidence

Red teaming and runtime monitoring are most useful when they produce evidence about how an AI system behaves under pressure, not just whether it passed a test. The important outputs are which tools were reachable, which data was exposed, and what actions were attempted outside expected scope. That evidence is critical for IAM and security teams because it turns AI governance from abstract policy into observable control performance. Without that telemetry, organisations cannot tell whether their AI stack is operating as a supervised application or an over-permissioned identity.

Practical implication: require telemetry that shows tool access, data access, and attempted actions before approving production AI workflows.


Threat narrative

Attacker objective: The attacker aims to turn model interaction into operational access, then use that access to extract data or trigger actions the enterprise never intended to allow.

  1. Entry occurs when an attacker steers an LLM through prompt injection or abuses a connected interface that accepts untrusted input and influences model decisions.
  2. Escalation follows when the model has access to tools, data sources, or APIs that let it move beyond text generation into operational actions or data retrieval.
  3. Impact appears when the model leaks information, triggers unintended downstream actions, or enables broader compromise through delegated access and unsafe automation.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.


NHI Mgmt Group analysis

Execution-layer control is the missing category in most LLM security stacks. The article catalogues prompt injection, leakage, hallucination, and red-teaming tools, but those controls mostly inspect what the model says. In enterprise deployments, the harder problem is what the model can do after it receives a prompt. That is an identity and authorisation problem, not just a content problem. Practitioners should treat execution scope as the control plane that determines whether AI security is real or cosmetic.

Model safety and identity safety are converging into the same governance problem. Once an LLM can invoke tools, retrieve data, or act in workflows, the old separation between application security and identity governance breaks down. Service accounts, API keys, and delegated tokens become the mechanisms through which model behaviour becomes business impact. The implication is that IAM teams can no longer review AI systems as if they were passive apps; they must govern them as active identities with bounded authority.

AI red teaming is only useful when it proves privilege boundaries, not just prompt resistance. A tool can withstand thousands of prompt variants and still be dangerously over-permissioned. That is why this category needs evidence about reachable tools, accessible data, and attempted actions outside policy. Practitioners should stop asking whether a model is hard to confuse and start asking whether it can do anything harmful even when confused.

Runtime policy enforcement is becoming the real differentiator in AI security governance. Static filters are easy to compare, but they do not tell you whether the system can respect task boundaries during execution. This is where NHI governance, secrets management, and identity lifecycle controls become central to AI security design. The field is moving toward policy at the point of action, and practitioners should evaluate tools on that basis.

Named concept: execution-layer blind spot. The article surfaces a recurring failure mode where organisations secure prompts and outputs but leave the action path under-governed. That blind spot matters because the model’s real risk emerges after interpretation, when tools, data, and credentials are in play. The implication is that teams must rethink AI controls as identity controls first and content controls second.

From our research:

  • 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
  • Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
  • For the broader operating model behind this gap, see OWASP Agentic AI Top 10 for the execution and tool-use risks that prompt filters do not cover.

What this signals

Execution-layer governance is now the programme-level question. Teams that only measure prompt safety will miss the real control gap, which sits where the model touches tools, data, and credentials. The useful next step is to classify each AI workflow by the identities it can reach and the actions it can trigger, then decide which ones are safe enough for production.

AI security will increasingly look like identity security. As more model workflows rely on service accounts and delegated access, the boundary between IAM, secrets management, and AI governance narrows. Practitioners should expect control reviews to shift from model quality checks toward authorisation scope, logging, and lifecycle ownership.

Access review cycles alone will not be enough if the workflow can act faster than governance can observe. Organisations need operational telemetry that shows what the model touched and what it attempted in real time. That is the difference between a controllable AI programme and an unbounded automation layer.


For practitioners

  • Map every model-connected tool to an identity owner Document which service account, API key, or delegated token each LLM workflow uses, and require a named owner for each identity. This makes tool reachability auditable before the model is allowed into production.
  • Separate prompt controls from execution permissions Treat prompt injection filters as input hygiene, not authority. Enforce policy at the point where the model requests data, calls tools, or triggers workflows, so text manipulation cannot become uncontrolled action.
  • Require telemetry for reachable tools and accessed data Record which tools were callable, which datasets were touched, and which actions were attempted outside policy. Without that evidence, AI monitoring cannot support review, incident response, or compliance.
  • Test AI workflows against least-privilege failure cases Red-team the system for overbroad access, unsafe downstream actions, and credential leakage, not just prompt jailbreaks. The goal is to prove that a confused model still cannot cross defined boundaries.
  • Review AI lifecycle governance alongside NHI controls Align onboarding, access changes, and offboarding for model-connected identities with the same discipline used for workloads and secrets. AI systems should not outlive their authorised scope or retain stale access after workflow changes.

Key takeaways

  • LLM security tools are still strongest on prompt and output risk, but enterprise exposure now lives in the execution path.
  • The evidence points to a widening governance gap: most organisations see AI agents act beyond scope, yet fewer than half have policies to control them.
  • Practitioners should evaluate AI security through identity, tool reach, and lifecycle ownership, not just through filters that inspect model text.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Agentic AI Top 10A1Prompt injection and tool misuse are central to the article's tool comparison.
OWASP Non-Human Identity Top 10NHI-03Delegated AI access depends on secrets, tokens, and runtime privilege scope.
NIST CSF 2.0PR.AC-4The article's core issue is whether AI workflows have access beyond their intended scope.

Apply least-privilege review to each AI-connected identity and verify access at every workflow change.


Key terms

  • Execution-layer governance: The set of controls that governs what an AI system can actually do after it receives input. It covers tool access, data access, and downstream actions, which means security must move beyond prompt filtering and into identity, authorisation, and runtime policy enforcement.
  • Prompt injection: A technique that manipulates an AI system’s instructions by embedding malicious text in user input or retrieved content. In practice, it matters because the attack can redirect a model that already has access to tools or data, turning text manipulation into operational misuse.
  • Delegated identity: An identity or token that allows an AI system to act inside enterprise systems on behalf of a workflow or application. The security problem is not the model itself, but the authority attached to it, which must be bounded, logged, and reviewed like any other high-risk machine identity.
  • Tool reachability: The set of tools, APIs, and data sources an AI workflow can access during execution. This is a core governance concept because a model may appear harmless in text output while still being able to reach privileged systems and perform actions outside policy.

What's in the full article

Lakera's full article covers the operational detail this post intentionally leaves for the source:

  • Per-tool feature breakdowns for prompt injection, leakage detection, hallucination handling, and red teaming across the listed products.
  • Vendor-specific implementation notes on integrating security controls into LLM workflows and APIs.
  • Examples of the threat categories each tool claims to address, including direct and indirect prompt injection.
  • The article's own framing of how its tool set maps to practical LLM security use cases.

👉 Lakera's full article covers the tool-by-tool breakdown and the threat categories each product is built to address.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or governance in your organisation, it is worth exploring.
NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-04-20.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org