Should organisations use one control layer for all agentic AI risks?

Why This Matters for Security Teams

A single control layer sounds efficient, but agentic ai attacks do not happen in one place. Prompt injection, retrieval poisoning, tool abuse, and unsafe output each require different detection points and different enforcement logic. That is why current guidance suggests layering controls across the full agent workflow rather than trusting one gate to do all the work. NHI Management Group’s AI Agents: The New Attack Surface report shows why: 80% of organisations say their AI agents have already acted beyond intended scope, and only 44% have implemented policies to govern them.

Practitioners should treat the question as an architecture issue, not a tool choice. The risk is not just malicious prompts. It is an autonomous system that can chain actions, call tools, retrieve data, and generate outputs faster than a human review loop can react. The OWASP Agentic AI Top 10 and the NIST AI Risk Management Framework both point toward risk-specific controls, not a universal catch-all. In practice, many security teams discover this only after an agent has already used a “safe” pathway to reach an unsafe outcome.

How It Works in Practice

The practical answer is to separate controls by stage and then link them into one policy chain. Prompt controls reduce instruction abuse at the input boundary. Retrieval controls govern what the agent can search, fetch, or ground on. Tool governance constrains which actions the agent can execute, under what conditions, and with which credentials. Output filtering reduces harmful or policy-violating responses before they leave the system.

That separation matters because each stage fails differently. A well-filtered output does not stop a poisoned document from steering the model. A strict prompt guardrail does not stop a mis-scoped tool token from triggering an unwanted API call. For agent builders, the emerging best practice is runtime policy evaluation with workload identity, short-lived credentials, and explicit approval gates for high-risk actions. Frameworks such as the CSA MAESTRO agentic AI threat modeling framework and MITRE ATLAS adversarial AI threat matrix are useful because they force teams to model abuse paths across the whole chain, not just one layer.

That is also consistent with NHIMG’s analysis in OWASP NHI Top 10 and Ultimate Guide to NHIs — Key Challenges and Risks, where identity, permissions, and secrets handling are treated as part of the same control plane. In practice, the policy chain should be evaluated at request time, with logs that show what the agent tried to do, which context it used, and which control stopped it. These controls tend to break down in highly dynamic multi-agent workflows because one agent can inherit trust from another and bypass the intended decision point.

Use separate policy checks for prompts, retrieval, tools, and outputs.

Issue short-lived secrets and revoke them after the task completes.

Bind tool access to workload identity rather than static user roles.

Require runtime evaluation for high-impact actions, not pre-approved blanket access.

Common Variations and Edge Cases

Tighter control chains often increase latency and operational overhead, so organisations must balance containment against user experience and release speed. There is no universal standard for this yet, especially for agents that operate across multiple models, plugins, and SaaS systems. The right answer depends on whether the agent only drafts content, retrieves internal data, or can take external actions such as ticket changes, code deployment, or payment initiation.

One important edge case is that output filtering alone may look effective in testing but still leave the upstream attack surface open. Another is retrieval-heavy agents, where the main risk is not the final answer but the model being misled by tainted context. For those systems, organisations should prioritise access scoping, provenance checks, and data minimisation before they focus on response moderation. NIST’s AI governance guidance and the NIST Cybersecurity Framework 2.0 both support this layered approach.

NHIMG research on AI LLM hijack breach and Top 10 NHI Issues reinforces a simple point: the more autonomy an agent has, the less defensible a one-layer control strategy becomes. The practical design goal is not maximum restriction everywhere, but control placement that matches the specific failure mode. That is where most implementations get the design right in theory and wrong in production, especially when teams assume a single “AI firewall” can cover all agent behaviour.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Agentic threats split across prompts, tools, retrieval, and outputs.
CSA MAESTRO	T1	MAESTRO models agent workflows as distinct trust and abuse boundaries.
NIST AI RMF	GOVERN	AI RMF governance supports risk-based controls instead of one universal layer.

Assign control owners, review risk per agent stage, and enforce runtime accountability.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Should organisations use one control layer for all agentic AI risks?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group