How can organisations tell whether their NHI controls are keeping up with AI agents?

Why This Matters for Security Teams

AI agents change the test for NHI control maturity because they do not behave like static service accounts. They can request tools dynamically, chain actions, and operate across systems faster than human review cycles can keep up. That means the real question is not whether an identity exists, but whether the organisation can prove what the agent was allowed to do, at the moment it tried to do it.

When controls are still built around periodic reviews and manual log correlation, they often miss the actual failure mode: an agent with broad standing access and no crisp revocation path. NHI programmes that look good on paper can still fail when auditors ask for a clean trail from request to scope to revocation. NHI Management Group’s AI Agents: The New Attack Surface report notes that only 52% of companies can track and audit the data their AI agents access, which is a strong signal that visibility is lagging behaviour. Guidance in the OWASP Agentic AI Top 10 also reflects this shift toward runtime risk rather than static entitlement review. In practice, many security teams discover the gap only after an agent has already accessed the wrong tool, not during design.

How It Works in Practice

The fastest way to assess whether NHI controls are keeping up is to follow one agent action end to end. Start with the request: can the team identify who or what initiated it, what intent was declared, and which policy decided the outcome? Then verify whether the granted scope was minimal, time-bound, and tied to a workload identity rather than a reusable secret. For agentic systems, current guidance suggests using runtime authorisation and ephemeral credentials instead of assuming a pre-approved role will remain appropriate for the whole task.

Practically, this means asking for evidence in four layers:

Workload identity: is the agent authenticated as a cryptographic workload, such as via SPIFFE or OIDC, rather than a shared token?

Decision policy: was access evaluated at request time with policy as code, such as OPA or Cedar, based on context and intent?

Credential lifetime: were secrets issued just in time, with short TTLs and automatic revocation after task completion?

Auditability: can the team reconstruct which tool was used, what data was touched, and whether the access was rescinded cleanly?

That pattern aligns with the operating model described in OWASP NHI Top 10 and the NIST AI Risk Management Framework, both of which push organisations toward traceable governance rather than trust in static entitlements. The important operational test is whether the control plane can answer those questions without stitching together multiple logs by hand. These controls tend to break down in multi-agent pipelines with shared tool brokers, because scope changes and handoffs make ownership and revocation difficult to prove.

Common Variations and Edge Cases

Tighter controls often increase operational overhead, so organisations have to balance assurance against developer friction and agent throughput. That tradeoff is especially real in environments where agents act as orchestrators across SaaS, internal APIs, and code execution tools, because each additional approval step can slow legitimate work.

Best practice is evolving for delegated and multi-agent workflows, and there is no universal standard for this yet. Some teams will use a brokered model where the agent never holds long-lived secrets directly, while others will give each task a dedicated, short-lived token with narrow claims. The right choice depends on whether the risk is credential theft, unauthorised tool chaining, or silent overreach into sensitive data. The control should still answer the same core questions: who requested access, what was granted, why it was granted, and how it was revoked.

That is also where vendor dashboards can create false confidence. If the evidence lives only in agent logs, prompt traces, and point-in-time access records, the programme may appear governed while still being unable to prove containment. The practical benchmark is whether a reviewer can verify the whole lifecycle from one place, not whether the organisation has accumulated more telemetry. The Ultimate Guide to NHIs and CSA MAESTRO agentic AI threat modeling framework both reinforce that point by treating identity, policy, and runtime behaviour as one control surface.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Agentic runtime misuse is the core issue when access outpaces static roles.
CSA MAESTRO	M2	MAESTRO covers agent identity, tool access, and runtime governance gaps.
NIST AI RMF	GOVERN	AI RMF governance is needed to assign accountability for autonomous agent behaviour.

Test agent decisions at runtime and bound every tool call to intent and least privilege.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

How can organisations tell whether their NHI controls are keeping up with AI agents?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group