Because they combine business context, privileged data access, and runtime decision-making in one execution path. When an AI component can read internal information or trigger actions, ordinary workload permissions are no longer enough. Teams must govern both the identity that calls the system and the permissions the AI can exercise.
Why This Matters for Security Teams
On-premises AI changes the identity problem because the system is no longer just a passive model endpoint. It often sits next to sensitive internal data, inherits broad network reach, and can be wired into ticketing, data, or operational workflows. That means the blast radius is driven by both who can invoke the system and what the AI can do once invoked, which is why standard workload permissions are usually too coarse.
Current guidance suggests treating these systems as high-risk identity spines rather than ordinary applications. The practical issue is not only model misuse, but also secret exposure, over-privileged service accounts, and weak revocation. NHI Management Group’s Ultimate Guide to NHIs notes that 97% of NHIs carry excessive privileges, which aligns with what teams see when AI services are dropped into existing enterprise permissions without redesign. The identity surface expands further when prompts, tools, and downstream actions all share one runtime path, a pattern reflected in the OWASP Non-Human Identity Top 10. In practice, many security teams encounter misuse only after an internal AI has already been given access that was never intended for autonomous use, rather than through intentional design.
How It Works in Practice
Security teams need to separate three layers: the human or service that requests the AI, the workload identity of the AI runtime itself, and the permissions the AI may exercise for a specific task. That is where static IAM breaks down. An on-premises AI system rarely follows a fixed access pattern, so role assignments designed for human users or conventional services do not map cleanly to agent-like behaviour.
Best practice is evolving toward runtime authorization with short-lived credentials. Instead of long-lived API keys or broad service account rights, teams should issue just-in-time access with tight time-to-live values, then revoke access automatically after task completion. Where the environment supports it, workload identity mechanisms such as SPIFFE or signed OIDC assertions provide cryptographic proof of what the workload is, not just a secret it holds. That identity should then be evaluated against policy at request time, using context such as data sensitivity, destination system, tool call type, and whether the request is human-approved.
Practically, this means:
- Binding each AI runtime to a unique workload identity rather than a shared service account.
- Using policy-as-code for real-time authorization decisions instead of static allowlists.
- Issuing ephemeral secrets only when the agent needs them, then revoking them immediately after use.
- Logging tool calls and data access separately so investigators can reconstruct both intent and effect.
NHI Management Group’s Top 10 NHI Issues and 52 NHI Breaches Analysis both reinforce a consistent pattern: excessive privilege and poor lifecycle control are the recurring failure modes. The same lesson shows up in NIST Cybersecurity Framework 2.0, which emphasizes governance, access control, and continuous monitoring as linked functions rather than separate checkboxes. These controls tend to break down when on-premises AI is integrated into legacy network zones that assume stable, human-shaped access patterns because runtime decisions become too dynamic for pre-approved entitlements.
Common Variations and Edge Cases
Tighter identity controls often increase operational overhead, requiring organisations to balance stronger containment against deployment speed and developer friction. That tradeoff becomes sharper on-premises, where teams may want local inference for data residency or latency reasons, but the same local placement can make privilege sprawl easier if every internal system trusts the AI by default.
There is no universal standard for this yet, but current guidance suggests several edge cases deserve special handling. If the AI is read-only, the risk is lower, though sensitive retrieval still demands strong authentication, auditing, and secret isolation. If the system can write to databases, send messages, or trigger workflows, it should be treated more like a privileged operator than a simple application. If multiple agents or toolchains are chained together, the identity problem compounds because one agent can inherit or relay rights from another.
This is where agentic governance and NHI governance overlap. The Ultimate Guide to NHIs — Key Challenges and Risks and the OWASP NHI Top 10 point to the same practical conclusion: identity must be enforced per runtime, per task, and per action. For teams operating regulated or high-value environments, that usually means combining ephemeral credentials, explicit human approval for sensitive actions, and continuous policy evaluation rather than relying on one broad on-prem trust zone.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | NHI-03 | Agentic systems need short-lived, task-scoped access instead of broad standing rights. |
| CSA MAESTRO | MAESTRO addresses governance for autonomous tool use and agent decision paths. | |
| NIST AI RMF | GOVERN | AIRMF GOVERN supports accountability for AI systems with privileged internal access. |
Replace standing access with runtime-approved, ephemeral permissions tied to each agent task.