AI agent guardrails are shifting from chat to multimodal action

By NHI Mgmt Group Editorial TeamPublished 2025-09-24Domain: Breaches & IncidentsSource: Enkrypt AI

TL;DR: Agentic, multimodal AI changes the risk model because systems can now act across text, images, and voice, where one poisoned input or spoofed instruction can trigger approvals, transfers, or policy decisions, according to Enkrypt AI. The governance gap is no longer model quality alone but whether runtime controls can constrain action before auditless damage occurs.

At a glance

What this is: Enkrypt AI’s announcement argues that multimodal AI agent risk now centers on preventing harmful actions before they execute across text, image, and voice channels.

Why it matters: For IAM, IGA, PAM, and NHI teams, the practical issue is that policy and approval logic must now account for agentic systems that can trigger real-world actions, not just generate content.

👉 Read Enkrypt AI's analysis of multimodal AI agent guardrails and AI security recognition

Context

Agentic multimodal AI is different from traditional chat AI because the system can interpret inputs and then take actions, including approvals, transfers, and policy updates. That shifts the identity problem from output moderation to action governance, where the important question is which controls stop an agent before it can execute a harmful decision.

For identity programmes, this is the point where NHI governance and AI governance overlap. If an AI system can see, decide, and do, then the organisation needs controls for scope, authorisation, auditability, and least privilege that apply to machine behaviour in motion, not just to credentials at rest. The article’s starting position is typical for the current market: security teams are still catching up to the operational reality of agentic systems.

Key questions

Q: How should security teams govern AI agents that can act across text, image, and voice?

A: Security teams should govern those agents as runtime identities with constrained action rights, not as passive content systems. The control model needs input filtering, policy enforcement before execution, and explicit approval boundaries for high-risk actions. If the agent can trigger transfers, account changes, or compliance decisions, its permissions must be scoped like any other privileged identity.

Q: Why do multimodal AI agents create more risk than text-only assistants?

A: Multimodal agents expand the attack surface because instructions can arrive through images, audio, or documents that the system interprets as context for action. That means a spoofed voice or poisoned screenshot can influence tool use and approval behaviour. The risk is amplified when the agent is connected to business systems with real execution authority.

Q: How can organisations tell whether agent guardrails are actually working?

A: They should test whether harmful inputs are blocked before execution, whether policy violations are logged with a clear decision path, and whether high-risk actions still require the intended approval gate. If the agent can complete a transfer or account change without those controls firing, the guardrails are cosmetic rather than effective.

Q: Should AI model safety scores be used as the main approval criterion for deployment?

A: No. Safety scores are useful, but they only measure one part of the risk picture. Deployment approval should also examine tool access, delegated authority, workflow sensitivity, and audit requirements. A model can look safe in isolation and still become unacceptable once it is connected to privileged systems.

Technical breakdown

Why multimodal input changes agent risk

Multimodal agents can process text, images, and voice in the same decision loop, which expands the attack surface beyond prompt injection alone. A poisoned screenshot, a manipulated PDF, or a spoofed voice command can become an instruction that the model treats as actionable context. The technical issue is not just content generation, but context contamination across modalities. Once that contaminated context reaches tool selection or approval logic, the agent may act on it without a human seeing the original manipulation.

Practical implication: govern every input channel that can influence agent decisions, not just text prompts.

Guardrails and policy engines at runtime

Real-time guardrails sit between agent intent and execution. They inspect proposed actions against policy, risk rules, and compliance constraints before the action completes. In an agentic system, this matters more than post hoc review because the harm often occurs at the point of execution, not after the model responds. A policy engine can block a transfer, an account change, or a non-compliant decision even when the agent has technically arrived at that action through a valid internal chain of reasoning.

Practical implication: place enforcement at the decision boundary, not only in downstream audit and review.

Why leaderboard-style evaluation is only part of the control model

A safety leaderboard helps compare models on behavioural risk, but it does not replace environment-specific governance. Model-level benchmarks tell you something about baseline safety characteristics, yet enterprise exposure depends on how the model is connected to tools, data, approvals, and sensitive workflows. An agent that scores well in isolation can still become dangerous once it can access finance systems, compliance workflows, or customer records. That is why model evaluation and access governance must be treated as separate controls.

Practical implication: evaluate model risk and runtime access risk independently before approving deployment.

Threat narrative

Attacker objective: The attacker wants the agent to convert manipulated input into an executable business action that the organisation treats as legitimate.

Entry occurs when a poisoned screenshot, PDF, or spoofed voice message reaches the agent through a trusted input channel.
Escalation occurs when the contaminated input influences tool selection or approval logic, causing the agent to take an unauthorised or non-compliant action.
Impact occurs when the agent completes a transfer, account change, or policy decision that creates real operational loss and weakens auditability.

Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Multimodal AI guardrails are now an identity control, not just a model control. Once an agent can act on text, images, and voice, the security problem becomes whether the organisation can constrain what that identity is allowed to do at runtime. That means the governance boundary shifts from content safety to action authorisation, with IAM, PAM, and NHI controls all intersecting at the execution point. Practitioners should treat multimodal guardrails as part of identity enforcement, not an optional add-on.

Action-before-audit is the new failure mode in agentic AI. The article describes harm that can cascade into transfers, account changes, and unlogged policy decisions, which is exactly what conventional review cycles are too late to catch. This is a structural problem for governance programmes that assume evidence will exist after the fact. Practitioners need to recognise that auditability is no longer a retrospective control if the system can complete the decision before review begins.

Model reputation and operational safety are different control questions. A leaderboard can tell you something about risk posture, but it does not answer whether the model is safe in a finance workflow, a compliance workflow, or a customer account context. That distinction matters because enterprise risk is created by the combination of model behaviour, connected tools, and delegated authority. The practitioner conclusion is simple: model selection without runtime governance is incomplete.

Agentic AI is forcing NHI governance to absorb a new trust boundary. The same security logic that once applied to service accounts and API keys now has to extend to systems that can infer, select, and execute actions across modalities. That does not make the identity problem disappear into AI governance. It makes NHI governance the foundation for proving where the machine identity ends and the business decision begins. Practitioners should rethink access, approval, and accountability as one control plane.

From our research:
98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
That visibility gap is why practitioners should also review OWASP NHI Top 10 before expanding agentic access.

What this signals

Agentic AI governance is moving from policy intent to execution control. The market is converging on the reality that approval rules, audit logging, and input filtering must operate at runtime, because post hoc review does not contain a system that can act immediately. Practitioners should expect security architecture to shift toward decision-boundary enforcement, especially where multimodal inputs can influence privileged workflows.

With 96% of technology professionals identifying AI agents as a growing security threat, the governance gap is no longer speculative. The immediate question for security teams is whether their existing IAM, PAM, and NHI controls can prove who authorised the action, what input influenced it, and whether the output stayed inside scope. That is a programme design problem, not just a model-selection problem.

Multimodal agent control will become a cross-functional identity issue. Compliance, legal, security, and platform teams will need a shared view of what an agent is allowed to see, decide, and do. The organisations that move first will be the ones that treat agent identity as part of enterprise access architecture rather than as an AI side project.

For practitioners

Define runtime action boundaries Enumerate which actions an AI agent may propose, which it may execute, and which require human approval before completion. Tie those boundaries to business processes such as payments, account updates, and policy changes, then test them against real multimodal inputs.
Treat every input channel as policy-relevant Extend control coverage to screenshots, PDFs, audio, and copied text that can influence agent behaviour. Validate that harmful instructions embedded in those channels are blocked before they reach tool-use or decision logic.
Separate model evaluation from access governance Use model safety scoring for selection, but require a distinct approval path for tool access, sensitive data access, and workflow execution. A safer model is still unsafe if its delegated identity is too broad.
Log the decision path, not just the final output Capture the input modality, policy checks, tool calls, and approval outcomes so investigators can reconstruct how the agent reached an action. Without that chain, incident review becomes guesswork.

Key takeaways

Agentic, multimodal AI changes the control problem from content moderation to action authorisation at runtime.
Poisoned inputs, spoofed instructions, and unlogged decisions create a practical governance gap that traditional review cycles cannot close on their own.
Security teams should separate model safety, delegated access, and approval logic before expanding agentic deployment.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		The article concerns multimodal agent behaviour, tool use, and runtime guardrails.
NIST AI RMF		The post is about managing risk for AI systems that influence business actions.
OWASP Non-Human Identity Top 10	NHI-01	Agent identities and delegated access create NHI-style privilege and governance exposure.

Inventory agent identities, constrain delegated permissions, and review lifecycle controls before production use.

Key terms

Agentic AI: AI that can choose actions and execute them through tools or workflows rather than only generating content. In practice, agentic AI creates an identity problem because the system may hold delegated access, make runtime decisions, and trigger business actions that require governance, logging, and approval controls.
Multimodal AI: AI that can process more than one input type, such as text, images, and audio, within the same decision flow. For security teams, multimodal capability matters because malicious instructions can hide inside non-text channels and still influence the agent's behaviour and downstream tool use.
Runtime Guardrail: A control that evaluates a model's proposed action before it executes. Unlike after-the-fact monitoring, runtime guardrails can block risky tool calls, enforce policy, and reduce the chance that an agent completes a harmful or non-compliant action while still inside a live session.
Delegated Identity: An identity that acts with permissions assigned on behalf of a person, system, or workflow. In agentic environments, delegated identity becomes high risk when the scope is broader than the task, because the system can use those privileges autonomously or with minimal oversight.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity lifecycle are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or programme maturity, it is worth exploring.

This post draws on content published by Enkrypt AI: Enkrypt AI Recognized as a Gartner Cool Vendor in AI Security 2025. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-09-24.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org