Why do AI governance controls fail when they are only documentary?

Why This Matters for Security Teams

Documentary controls fail because governance that only exists in policies cannot prove what live systems actually did. For ai governance, that gap is especially dangerous: an approved risk register does not show whether an agent was constrained, logged, or able to access sensitive data at runtime. Current guidance from the NIST AI Risk Management Framework and the EU AI Act both point toward operational evidence, not paper compliance.

That distinction matters for NHI security as well, because the same failure pattern appears when organisations treat identity controls as documentation instead of enforceable runtime logic. NHIMG research on Ultimate Guide to NHIs — Regulatory and Audit Perspectives and Top 10 NHI Issues shows how auditability, rotation, and visibility break down when controls are not tied to live systems. In practice, many security teams encounter control failure only after an auditor, incident responder, or regulator asks for production evidence that never existed.

How It Works in Practice

AI governance becomes operational only when every key control can be observed, tested, and reconstructed from system behaviour. That means linking policy to technical enforcement: who approved the model or agent, what it could access, what prompts or actions were logged, whether human review was required, and what happened when a threshold was exceeded. Documentary controls stop at intent; operational controls prove execution.

For autonomous workloads, this usually requires a control stack that includes:

runtime logging of prompts, tool calls, and outputs with retention aligned to risk

policy-as-code checks that evaluate requests at the moment of action

clear human oversight triggers for high-impact or sensitive decisions

incident response playbooks that can isolate an agent, revoke access, and preserve evidence

The most useful evidence is the kind an assessor can verify from production systems, not from slide decks. The NIST Cybersecurity Framework 2.0 reinforces this by framing governance as a measurable capability, while NHIMG’s Ultimate Guide to NHIs — Lifecycle Processes for Managing NHIs shows why lifecycle evidence matters for identities that act without a human in the loop. Where teams need a concrete security signal, NHIMG research reports that the average time to remediate a leaked secret is 27 days, despite strong confidence in secrets management capabilities, which illustrates the same gap between belief and operational proof. These controls tend to break down when agents and models are spread across multiple SaaS, cloud, and orchestration layers because no single team can reliably reconstruct the full decision trail.

Common Variations and Edge Cases

Tighter governance often increases engineering and compliance overhead, requiring organisations to balance assurance against speed and system complexity. Best practice is evolving, and there is no universal standard for how much evidence is enough for every AI use case, especially when lower-risk internal tools are governed differently from customer-facing or regulated workflows.

One common edge case is the “policy-only” programme that has good documentation but weak telemetry. Another is a heavily automated environment where evidence exists in logs, but the logs are incomplete, fragmented, or impossible to correlate across model, agent, and data platforms. A third is the hybrid case where some controls are automated and others rely on manual review, creating inconsistent artefacts for auditors.

For that reason, teams should treat documentation as supporting material, not the control itself. The stronger test is whether the organisation can demonstrate that an AI system or agent was constrained in real time, using live artefacts that map to governance intent and operational practice. NHIMG’s The State of Non-Human Identity Security reinforces the broader issue: confidence is often far higher than visibility, and controls that cannot be observed rarely survive first contact with a real review.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack surface, NIST AI RMF set the technical controls, and EU AI Act define the regulatory obligations.

Framework	Control / Reference	Relevance
EU AI Act		Requires operational proof of governance, oversight, and logging for AI systems.
NIST AI RMF		Emphasises measurable AI governance, accountability, and risk treatment.
OWASP Agentic AI Top 10		Agentic systems need runtime controls because behaviour is dynamic and non-deterministic.

Convert AI governance into testable controls with evidence, monitoring, and escalation paths.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Why do AI governance controls fail when they are only documentary?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group