How should security teams govern AI agents that run long, multi-step workflows?

Why This Matters for Security Teams

Long-running AI agents are not just another application workload. They are autonomous actors that can chain tools, pursue goals, and accumulate privilege across many steps, which makes static RBAC a poor fit. Governance has to focus on what the agent is trying to do at runtime, not only on a pre-approved role. That is why current guidance is moving toward intent-based authorisation, durable execution, and verifiable identity for the workload itself, as reflected in the OWASP Agentic AI Top 10 and the NIST AI Risk Management Framework.

This matters because agent failures are not limited to broken prompts. They involve state loss, hidden retries, tool misuse, and actions that outlive the original user intent. NHIMG research shows how quickly secrets become exploitable in the real world: in the AI LLM hijack breach and similar incidents, credential exposure turns into rapid abuse, while the OWASP NHI Top 10 highlights why agent identity and secret handling must be treated as first-class controls. In practice, many security teams encounter this only after an agent has already executed an unauthorised side task or accessed data outside its intended scope.

How It Works in Practice

For multi-step workflows, governance should start with workload identity and per-task authority. The agent should present a cryptographic identity for the workload, then receive just-in-time credentials only for the specific step it is authorised to perform. That means short-lived tokens, narrow scopes, and automatic revocation when the task ends. Static service accounts and long-lived API keys are poor choices because autonomous systems are not predictable in the same way as human-operated jobs. CSA frames this clearly in the CSA MAESTRO agentic AI threat modeling framework, while NIST zero trust guidance supports evaluating every request at the moment it is made.

Use durable execution so each step, retry, and tool call is recorded as an event, not a transient process state.

Issue ephemeral secrets per workflow stage, not broad credentials for the full agent lifecycle.

Evaluate policy at runtime with context such as target data, tool sensitivity, user intent, and step history.

Require human or policy approval for privilege escalation, external sharing, or irreversible actions.

That runtime evaluation can be implemented with policy-as-code patterns such as OPA or Cedar, but the key is consistency, not the specific engine. NHI governance also needs an audit trail that can reconstruct exactly which actions were authorised, replayed, or aborted. NHIMG’s Ultimate Guide to NHIs — Lifecycle Processes for Managing NHIs is useful here because agent credentials must be treated as managed identities, not incidental application secrets. These controls tend to break down when an agent spans multiple vendors or ephemeral compute environments, because identity continuity and event retention become inconsistent across boundaries.

Common Variations and Edge Cases

Tighter step-level control often increases operational overhead, requiring organisations to balance safety against workflow latency and developer friction. That tradeoff is real, especially for agents that perform many small actions in quick succession. Best practice is evolving, but there is no universal standard for this yet, so teams should classify workflows by blast radius rather than applying one governance pattern everywhere.

Low-risk agents may only need scoped JIT credentials and logging, while agents that touch finance, customer data, or production controls need stronger approval gates, richer audit history, and stricter separation of duties. High-autonomy systems also need special handling when they can self-assign subgoals or branch into parallel tool use, because failures may happen after a valid first step and before a risky second step. That is where the Top 10 NHI Issues and the NIST Cybersecurity Framework 2.0 remain useful as operational references for accountability, detection, and recovery. When agents can move across long-lived sessions, shared toolchains, or loosely governed MCP connections, the model breaks down because the organisation can no longer prove which action belonged to which intent.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Covers agent misuse and unsafe tool execution in autonomous workflows.
CSA MAESTRO		Provides threat modeling for multi-step agentic workflows and control mapping.
NIST AI RMF		Supports governance, mapping, and measurement for autonomous AI risk.

Define ownership, monitor behavior, and measure agent risk across the full workflow lifecycle.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

How should security teams govern AI agents that run long, multi-step workflows?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group