AI security testing now needs coverage across agent workflows

By NHI Mgmt Group Editorial TeamPublished 2026-04-12Domain: Agentic AI & NHIsSource: Lasso Security

TL;DR: AI security testing has moved from model prompts to full agent workflows, where tools, APIs, memory, and multi-step interactions create a larger attack surface and make input or output filtering insufficient, according to Lasso Security. Coverage now has to be continuous, behavioral, and intent-aware because agents can drift, be redirected, and act outside their intended scope without obvious single-step failures.

At a glance

What this is: This is an analysis of why AI security testing must expand from prompt checks to end-to-end agent workflow coverage, with fragile intent as the central failure pattern.

Why it matters: It matters because IAM, NHI, and governance teams now need visibility into what agents can do, not just what they say, or existing controls will miss scope drift and tool abuse.

By the numbers:

48% of cybersecurity professionals now identify agentic AI and autonomous systems as the top attack vector heading into 2026.
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, inappropriately sharing sensitive data, and revealing access credentials.

👉 Read Lasso Security's analysis of AI security testing coverage for agent workflows

Context

AI security testing is the practice of checking how a model, application, or agent behaves under pressure, not just under normal use. In agentic AI, the identity problem changes because the system can choose tools, call APIs, and act across workflows, which means security teams must understand what the agent is actually doing, not only what it is supposed to do.

The gap is structural: point-in-time prompt testing cannot reliably cover systems whose behaviour changes with context, memory, and downstream integrations. For practitioners running NHI and agentic AI programmes, this means the relevant control plane is no longer the output boundary alone, but the full chain of access, action, and persistence across connected systems.

That is why testing has to move from isolated prompts to inventory-led workflow coverage. The article’s core argument is that intent can be manipulated over time, and the same agent may behave differently as models, prompts, and tool connections evolve.

Key questions

Q: How should security teams test AI agents that use multiple tools and APIs?

A: Security teams should test the full workflow, not just the prompt and response. That means inventorying every connected tool, simulating multi-step interactions, and checking whether the agent can be redirected into actions outside its intended task. The point is to validate action boundaries, not only content filters.

Q: Why do AI agents create more security risk than chatbots?

A: AI agents can take actions, not just generate text. Once they can call tools, access data, and chain decisions across steps, the security issue becomes execution risk with a much larger blast radius. That is why intent, access scope, and workflow context matter more than response quality alone.

Q: How do you know if AI security testing is actually working?

A: Testing is working when it finds behavior that single-turn checks miss, especially tool misuse, context drift, and scope expansion across realistic workflows. A useful programme produces repeatable findings tied to specific agents, tools, and privileges, then revalidates after model or integration changes.

Q: What should organisations do when an AI agent’s scope keeps changing?

A: They should move from point-in-time review to continuous validation tied to lifecycle events such as model updates, prompt edits, and new tool connections. If the agent’s effective scope changes often, governance has to track the workflow as a living access boundary, not a fixed application setting.

Technical breakdown

Why agentic AI expands the security testing surface

Agentic systems do not stop at inference. They ingest prompts, consult memory or retrieval layers, invoke tools, and push actions into APIs and internal systems. Each connection creates a separate trust boundary, and the risk often emerges at the integration layer rather than in the model response itself. That is why an agent can appear compliant at the chat layer while still causing an unauthorized downstream action. Traditional test cases that only inspect the prompt and output miss that execution path entirely. For identity teams, the important shift is from content safety to action safety across the full workflow.

Practical implication: Map every tool, API, and data source an agent can reach before you test it.

Fragile intent and multi-turn manipulation

Fragile intent is the condition where an agent can be redirected from its intended purpose through sustained interaction, even though no single turn looks overtly malicious. Instead of one obvious jailbreak, the user or attacker nudges context across a sequence of exchanges until the agent accepts a new interpretation of the task. This is more than prompt injection. It is behavioural drift shaped by memory, policy interpretation, and conversational pressure. In practical terms, the security question becomes whether the agent can preserve purpose under pressure, not whether one prompt can be blocked.

Practical implication: Test for multi-step redirection, not just one-shot prompt bypasses.

Why continuous red teaming is now part of AI operations

Agentic environments change without code changes. A model update, a new prompt, or a newly connected tool can alter behaviour enough to invalidate last quarter’s test results. That makes point-in-time assessments unreliable as an assurance model. Continuous red teaming is therefore a lifecycle control, not a one-off validation exercise. It has to stay aligned with the agent’s current dependencies, privileges, and decision patterns. This is the same governance logic IAM teams already use for high-risk entitlements, applied to systems that adapt faster than annual review cycles can track.

Practical implication: Tie AI testing to change events, model updates, and new tool connections.

Threat narrative

Attacker objective: The attacker’s objective is to make the agent carry out a harmful or unauthorized action while still appearing to operate within normal conversational or workflow boundaries.

Entry occurs through a legitimate conversation or workflow invocation that gives the agent access to tools, memory, and connected systems.
Credential access is not the primary issue here, because the risk comes from approved access being redirected through multi-turn manipulation and fragile intent.
Escalation happens when the agent expands the user’s framing into unauthorized actions across APIs or internal tools, often without an obvious policy violation at any single step.
Impact is the completion of an unintended business action, data exposure, or workflow abuse that appears reasonable in isolation but was never authorized in context.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

AI security testing has become an identity problem, not just a model problem. Once an agent can select tools, call APIs, and carry context across steps, the control question shifts from what it outputs to what it is authorized to do. That is why agentic systems belong in the same governance conversation as non-human identities, because the failure mode is now execution, not content. Practitioners should treat workflow access as the real testing target.

Fragile intent is the right named concept for the redirection risk this article describes. The useful insight is not that agents can be tricked, but that their purpose can be destabilised over time through small, plausible interactions. That makes behavioural drift a governance concern for AI security testing, NHI oversight, and runtime monitoring alike. Practitioners should be asking where intent can be altered before policy ever sees a clear violation.

Continuous red teaming exposes the gap between static entitlement review and dynamic agent behaviour. Access reviews assume a stable subject, a stable scope, and a stable evidence trail. Agentic systems break that assumption because model updates, prompt changes, and new tool links can alter the effective privilege boundary without a corresponding governance event. Practitioners should stop treating point-in-time testing as assurance for a moving target.

Context-aware access control is becoming the minimum viable guardrail for agentic workflows. If an agent only needs read access, write access is excess risk, even when the workflow is legitimate. The article reinforces a broader NHI governance pattern: the more an actor can act across systems, the more tightly the allowed action set must match the actual use case. Practitioners should align permissions to verified task scope, not platform convenience.

Agent inventory is now a precondition for credible assurance. You cannot test or govern what you cannot enumerate, and the agent problem becomes opaque when code, cloud, and shadow AI all create separate pathways into the same workflow. That makes discovery the first control, not a reporting exercise. Practitioners should assume unknown agents are already part of the risk surface until proven otherwise.

From our research:
80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, inappropriately sharing sensitive data, and revealing access credentials, according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
With 92% agreeing that governing AI agents is critical but only 44% having implemented policies, our agentic AI research shows the assurance gap is already operational.

What this signals

With 80% of organisations already reporting agent behaviour beyond intended scope, the practical problem is not future risk but current governability. The teams that will stay ahead are the ones building control points around workflow access, not just model output. Fragile intent: the point at which a seemingly acceptable interaction can redirect an agent into the wrong goal. That is the boundary security programmes need to measure.

The programme signal is clear: continuous assurance is becoming part of AI operations, in the same way recertification and access review became part of identity governance. If the agent can change after a model update or tool integration, then the test has to change with it. The governing question is whether your current controls can describe the system that exists today, not the one that was approved last quarter.

For practitioners

Build a complete agent inventory List every AI agent and workflow-connected system, including model, tools, memory sources, and external APIs. Without that inventory, test coverage will remain partial and shadow AI will stay outside governance.
Test for multi-turn redirection Use conversation sequences that gradually shift context and challenge fragile intent, then measure whether the agent stays within the original task boundary across the full interaction.
Treat system prompts as security-sensitive assets Version, review, and test prompts alongside other privileged configuration because prompt changes can alter agent behaviour as much as code changes.
Map workflow permissions to actual task scope Remove write access, broad tool access, and unnecessary data reach where an agent only needs narrow operational visibility, then revalidate that scope after each integration change.
Tie red teaming to lifecycle change events Re-run tests when models, prompts, tools, or integrations change so your assurance matches the system currently in production, not the one you tested last quarter.

Key takeaways

AI security testing now has to cover agent workflows, because tool calls and API actions create risk that input and output filters do not see.
The scale of the problem is already visible in current deployments, where agents routinely act beyond intended scope and compliance teams often lack full auditability.
Continuous, intent-aware red teaming is the control that matches a moving agentic environment, especially when privileges and integrations change faster than review cycles.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	L1	The article centers on agent goal drift, tool use, and workflow abuse.
NIST AI RMF		Continuous assurance and lifecycle monitoring align with AI risk governance.
NIST Zero Trust (SP 800-207)	PR.AC-4	Agent access should match actual task scope and least privilege.

Apply AI RMF governance controls to tie testing, monitoring, and accountability to model and workflow changes.

Key terms

Fragile Intent: The point at which an AI agent can be steered away from its intended purpose through sustained interaction. It is not a single prompt failure. The risk appears when context, memory, and conversational pressure combine to change the agent’s effective goal during a live workflow.
Agentic Attack Surface: The full set of trust boundaries created by an AI agent’s tools, APIs, memory, data sources, and execution paths. It is broader than model prompts or outputs because the security problem includes what the agent can do downstream, not only what it says.
Intent Security: A testing and governance approach that focuses on whether an AI system stays aligned to its intended purpose under pressure. It examines behaviour, redirection, and task drift across the whole workflow, which makes it especially relevant for agents that can act across systems.
Automated AI Red Teaming: Continuous adversarial testing that probes AI systems for manipulation, unintended actions, and scope drift as they change over time. In agentic environments, it is used to validate the current workflow boundary, not just a static model response.

Deepen your knowledge

AI security testing for agent workflows is a core topic in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building assurance for agents, service accounts, and other non-human identities, it is a practical place to start.

This post draws on content published by Lasso Security: AI Security Testing Has a Coverage Problem. Automated AI Red Teaming Fixes It. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-04-12.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org