Why do generative and agentic AI create problems for traditional model risk management?

Traditional model risk management assumes stable inputs, stable outputs, and a bounded decision path that can be validated before deployment. Generative and agentic systems can retrieve data, combine tools, and initiate actions during runtime, which means their behaviour may change in production. That makes explanation useful, but not sufficient for governance.

Why Traditional Model Risk Management Breaks for Autonomous AI

Traditional model risk management works best when a system’s inputs, outputs, and decision path stay stable enough for pre-deployment validation. Generative and agentic ai do not stay in that box. They can call tools, retrieve live data, chain prompts, and initiate actions at runtime, which means the effective risk surface changes after approval. That is why explanation alone is not sufficient for governance. Current guidance from OWASP Agentic AI Top 10 and the NIST AI Risk Management Framework both push teams toward runtime controls, not just documentation.

NHIMG research shows why this matters operationally: in SailPoint’s AI Agents: The New Attack Surface report, 80% of organisations said their AI agents had already acted beyond intended scope. That is not a minor tuning issue; it is evidence that the old assumption of bounded behaviour no longer holds. In practice, many security teams encounter the failure only after an agent has already accessed data or taken an action that no pre-launch review anticipated.

How It Works in Practice

For agentic systems, the question is no longer only “is the model accurate?” but “what is the agent authorised to do right now, with this identity, in this context?” That is where static RBAC starts to fail. A role can describe a human job function, but it cannot fully describe an autonomous workload that makes task-by-task decisions. Better practice is emerging around intent-based authorisation, short-lived JIT credentials, and workload identity, so the agent receives only the minimum access needed for a specific action.

That means separating the model from the agent’s execution identity. The model may generate a plan, but the agent should present cryptographic workload identity, such as SPIFFE or OIDC-backed tokens, and request access through policy-as-code at runtime. This is the direction reflected in CSA MAESTRO agentic AI threat modeling framework and in NHIMG’s OWASP NHI Top 10, which both treat identity and tool access as first-class attack surfaces.

Issue short-lived secrets per task, not long-lived API keys that survive across sessions.
Evaluate policy at request time, using context such as task, data sensitivity, and destination system.
Revoke access automatically when the task completes or the agent’s intent changes.
Log every tool call and data access so audit can reconstruct what the agent actually did.

This model is stronger because it assumes the agent may be autonomous, goal-driven, and capable of chaining actions that were never explicitly scripted. These controls tend to break down when agents are allowed to keep persistent credentials across multiple systems, because the same identity can then be reused for lateral movement and privilege escalation.

Common Variations and Edge Cases

Tighter runtime controls often increase operational overhead, so organisations have to balance speed against containment. That tradeoff is real, especially where agents need broad access to support automation. Best practice is evolving, and there is no universal standard for this yet, but the direction is clear: reduce standing privilege and prefer ephemeral access over durable entitlements.

Edge cases appear in multi-agent workflows, developer copilots, and outsourced automation, where one agent may trigger another and inherit access indirectly. In those environments, the risk is not just model drift but permission drift. A single agent can become a broker for secrets, data, and actions unless every hop is checked against policy. NHIMG’s Top 10 NHI Issues is useful here because it frames credentials, lifecycle, and auditability as ongoing controls, not one-time setup. For broader governance mapping, the NIST AI Risk Management Framework and Ultimate Guide to NHIs — Regulatory and Audit Perspectives help teams connect technical controls to accountability and audit.

The practical takeaway is that traditional model risk management still matters, but it is incomplete on its own when the system can decide, retrieve, and act. For agentic AI, governance has to follow the execution path, not just the model artefact.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Agentic apps create runtime tool and auth risks beyond static model review.
CSA MAESTRO	TA-1	MAESTRO focuses on threat modeling autonomous agent behaviour and access.
NIST AI RMF	GOVERN	AI RMF GOVERN covers accountability for changing AI behaviour in production.

Assign ownership, policy, and review gates for agent actions before deployment and during operation.

Why do generative and agentic AI create problems for traditional model risk management?

Why Traditional Model Risk Management Breaks for Autonomous AI

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group