Teams should block agent activity at the stage where the risk appears. Prompts need input filtering, tool calls need execution checks, and outputs need disclosure review. That staged approach prevents a single control from being asked to do three different jobs and missing all three.
Why This Matters for Security Teams
Blocking agent activity is not a single-policy problem. Autonomous systems can shift from benign text generation to tool use, data retrieval, and external actions in a single workflow, which means the right control point depends on the stage where harm becomes possible. Current guidance suggests teams should stop treating prompts, tool invocations, and disclosures as the same risk surface. That is especially important because agent behaviour is goal-driven, not fixed by a human session. The OWASP Agentic AI Top 10 and NIST AI Risk Management Framework both point toward context-aware controls rather than broad perimeter blocking. NHI Management Group’s Ultimate Guide to NHIs notes that 90% of IT leaders say properly managing NHIs is essential for successful zero trust, which reinforces the need to place controls where the action happens. In practice, many security teams encounter agent misuse only after a tool call or data exfiltration has already occurred, rather than through intentional control design.
How It Works in Practice
The practical answer is to map each control to the action that creates risk. Prompt filtering should focus on malicious instructions, prompt injection, and unsafe context entering the model. Tool-call controls should inspect the agent’s intended action before execution, including destination, parameters, and data sensitivity. Output review should check what the agent is about to reveal, commit, transmit, or trigger downstream. That staged design aligns with OWASP Top 10 for Agentic Applications 2026 and CSA MAESTRO agentic AI threat modeling framework, both of which treat agent workflows as chained decision points rather than one monolithic request.
Teams usually get better results when they combine control placement with workload identity and just-in-time authority. For example, a policy can allow the agent to read a ticket, but only issue a short-lived token when the request matches an approved task, trust level, and environment. That is closer to the logic described in NIST AI Risk Management Framework than to classic role-based IAM. It also matches NHI realities documented in NHI Management Group’s Analysis of Claude Code Security, where the main issue is not just identity presence but when and how authority is granted.
- Block at prompt ingress when untrusted input can steer the agent into unsafe intent.
- Block at tool execution when the action exceeds task scope, data classification, or environment trust.
- Block at output and side-effect stages when disclosure, write, or send actions violate policy.
- Use real-time policy evaluation so the decision reflects current context, not a pre-baked role.
These controls tend to break down in multi-agent systems with shared memory and chained tools because one agent can inherit or amplify another agent’s authority faster than the policy layer can re-evaluate.
Common Variations and Edge Cases
Tighter blocking often increases operational overhead, requiring organisations to balance safety against workflow friction. That tradeoff is real, especially when agents support developers, analysts, or customer operations and false positives can stall legitimate work. There is no universal standard for exactly where every block should sit, so current guidance suggests matching the checkpoint to the most likely failure mode and the lowest-cost place to stop it.
Some environments need multiple block points for the same action. For example, a code assistant may need prompt filtering for malicious instructions, tool-call gating for repository writes, and output review for secret disclosure. Highly regulated workflows may also require compensating controls such as human approval for irreversible actions. Where agents operate with persistent memory, vendor plugins, or external browsing, the safest design is usually layered rather than singular. NHIMG’s Moltbook AI agent keys breach shows why static trust assumptions fail once agent credentials or tool access become reusable across tasks. In practice, the hard cases are autonomous workflows that mix internal data, external tools, and chained actions, because a single block point rarely sees the full blast radius.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A2 | Covers prompt injection and unsafe agent actions at the right control point. |
| CSA MAESTRO | TA-3 | Maps agent workflows to stage-based threat controls and execution gates. |
| NIST AI RMF | Supports contextual, risk-based decisions for autonomous AI systems. |
Place checks at prompt, tool, and output stages instead of relying on one blanket filter.
Related resources from NHI Mgmt Group
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on June 9, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org