On-premises AI infrastructure needs identity and runtime guardrails

By NHI Mgmt Group Editorial TeamPublished 2025-09-30Domain: Agentic AI & NHIsSource: Aqua Security

TL;DR: AI workloads running on premises inherit container, Kubernetes, and supply chain risks while adding prompt injection, data leakage, and GPU abuse paths, according to Aqua Security. The security gap is not the model alone but the identity, policy, and runtime controls around it, especially where AI components touch sensitive data and business transactions.

At a glance

What this is: This is an Aqua Security analysis of why on-premises AI workloads need identity-aware governance, runtime controls, and supply chain scrutiny from day one.

Why it matters: It matters because AI security now intersects with NHI, autonomous behaviour, and human access decisions inside the same application stack, and existing IAM patterns do not cover those combined risks.

By the numbers:

500 of the world’s largest enterprises., world’s largest enterprises.

👉 Read Aqua Security's analysis of securing on-premises AI infrastructure

Context

On-premises AI security is the problem of governing model behaviour, data access, and runtime execution inside the same environment that already carries container and Kubernetes risk. The primary issue is that AI components can act on sensitive business context while still relying on ordinary infrastructure controls that were never designed for model prompts, tool use, or GPU consumption.

That creates an identity governance problem as much as a platform problem. The organisation has to know which components are allowed to read data, which services can invoke tools, and which workloads can consume scarce compute without overrunning policy boundaries.

Key questions

Q: How should teams secure AI workloads running on premises?

A: Start by governing the full request path, not just the model. Security teams should authenticate the user, validate the gateway policy, control what context is attached, scan the workload supply chain, and enforce runtime guardrails on prompts, responses, and tool calls. That combination keeps AI execution within an auditable boundary.

Q: Why do on-premises AI systems create new identity and access risks?

A: Because they combine business context, privileged data access, and runtime decision-making in one execution path. When an AI component can read internal information or trigger actions, ordinary workload permissions are no longer enough. Teams must govern both the identity that calls the system and the permissions the AI can exercise.

Q: What breaks when AI workloads are protected only by container security?

A: Container security reduces infrastructure exposure, but it does not stop prompt injection, jailbreaks, or unsafe tool use inside the workload. If the runtime can still access sensitive context or invoke downstream actions, the model can be manipulated after deployment. Governance has to continue where the container ends and execution begins.

Q: How do security teams decide whether an AI workload is ready for production?

A: Use a governance test, not a marketing test. The workload is ready only if its models, dependencies, data sources, runtime controls, and resource limits are known, approved, and continuously monitored. If any of those elements are opaque, the deployment is still experimental from a security perspective.

Technical breakdown

Why on-premises AI security depends on the application stack

On-premises AI is not a standalone model problem. It is a chain of client authentication, API gateway policy, context enrichment, inference execution, and compute scheduling. Each layer can widen the blast radius if it trusts the next layer too much. When context services attach account data or business rules, they turn model input into governed transaction data. That means security failures can appear in ordinary places such as an unscanned image, a weak gateway policy, or an overly permissive service account. The model may be the visible layer, but the real control surface is the surrounding stack.

Practical implication: Map every AI transaction to the identity and policy checks that happen before the model sees the request.

How prompt injection and jailbreaks become governance failures

Prompt injection works by manipulating the model into treating attacker-controlled text as instruction or context. A jailbreak is similar, but its purpose is to push the model beyond its intended policy boundary. These are not simply content problems. They are governance failures when the model can reach tools, data, or outputs without a reliable enforcement layer. In on-premises environments, the danger increases because the AI workload often sits close to internal systems and can interact with privileged data paths. If the runtime does not inspect prompts, outputs, and tool calls, the organisation loses control over what the AI can actually do.

Practical implication: Enforce runtime guardrails that inspect model inputs, outputs, and tool use before sensitive actions are completed.

Why AI workloads inherit container and supply chain risk

AI applications built on containers and Kubernetes inherit the same weaknesses as other cloud native systems, but with higher stakes. A misconfigured cluster, an unscanned image, or an unchecked SDK can become the entry point for data exposure or compute abuse. Supply chain risk is especially important because external models, libraries, and AI components are frequently introduced into local environments with incomplete review. Once those dependencies are trusted, they can affect both data handling and system behaviour. In practice, AI security starts with knowing what was deployed, where it came from, and whether it was authorised for the workload it now serves.

Practical implication: Tie AI security reviews to image scanning, dependency inventory, and workload authorization before production rollout.

NHI Mgmt Group analysis

On-premises AI security is now an identity governance problem, not just a model safety problem. The article shows that useful AI requires context, permissions, and policy-linked execution, which means security teams are governing a transaction path rather than a static application. That changes how access, data exposure, and runtime trust need to be understood across the stack. Practitioners should treat AI workloads as governed execution environments, not isolated experimentation zones.

Prompt injection is a control-boundary failure when AI can act on internal context. A model that can read account data, invoke tools, or trigger workflows is no longer just answering questions. It is participating in business action, which means a successful injection can redirect an authorised path into an unauthorised one. The implication is that policy enforcement must sit at runtime, where the request, context, and action can still be checked.

Container and Kubernetes controls remain necessary, but they are no longer sufficient on their own. The article correctly links AI risk to unscanned images, misconfigured clusters, and supply chain exposure, but the deeper issue is that AI amplifies those weaknesses by adding dynamic behaviour and sensitive context. A platform can be secure enough for ordinary workloads and still be unsafe for AI. Practitioners need to evaluate AI deployments as a distinct workload class with tighter governance expectations.

Secret and SDK visibility will determine whether AI adoption stays governable. The article points to hidden AI use inside applications and external components brought into local systems without enough scrutiny. That creates a discovery problem before it becomes a policy problem. If teams cannot see where AI components live, they cannot govern which identities, data sources, and compute resources those components are allowed to touch.

Runtime protection is the difference between controlled AI and merely hosted AI. The article makes a strong case that visibility into prompts, responses, and GPU use is essential once AI moves into production. The practitioner lesson is that build-time hygiene cannot absorb runtime misuse on its own. Governance must continue after deployment, because the risk emerges when the workload starts making decisions inside live business flows.

From our research:
69% of organisations now have more machine identities than human ones, according to The Critical Gaps in Machine Identity Management report.
66% say their current tooling is not adequate to manage the scale of machine identities they now have.
NHI Lifecycle Management Guide helps teams connect discovery, rotation, and offboarding into one governance model.

What this signals

Runtime governance is becoming the control plane for AI security. As on-premises AI moves into business workflows, teams will need to watch for hidden tool use, opaque context attachment, and resource abuse that never appears in a traditional IAM review. This is where identity, workload, and application security start to converge in practice.

Ephemeral AI behaviour changes the review model. A workload that can read context, call tools, and complete a transaction within one session does not fit controls built around periodic certification alone. Teams should prepare for stronger runtime telemetry, tighter policy enforcement, and more direct linkage between workload identity and action logging.

AI workload sprawl will expose inventory gaps first. If organisations cannot see every model, SDK, container, and embedded AI function, they cannot govern the identities or data paths those components use. The operational response is to treat discovery as a security control, not a housekeeping exercise.

For practitioners

Inventory every AI workload and dependency Create a complete register of models, SDKs, containers, gateway policies, and downstream services before production use. Treat hidden AI inside existing applications as in-scope because unmanaged components are where governance breaks first.
Bind AI access to explicit policy checkpoints Require authentication and authorization at the client, gateway, and context service layers so that business data is only attached when the requesting identity is allowed to use it.
Inspect prompts, responses, and tool calls at runtime Place guardrails where the workload actually executes so that jailbreak attempts, sensitive-data leakage, and unsafe tool invocation can be blocked before the action completes.
Harden the container and supply chain path Scan images, verify dependencies, and restrict unreviewed SDKs or external components from entering the environment without approval. AI workloads inherit the same build risks as other cloud native systems, but the impact is broader.
Control GPU consumption as a governed resource Apply quotas and monitoring to prevent a single workload from consuming disproportionate GPU capacity. Resource abuse is an operational security issue when AI workloads share infrastructure with critical services.

Key takeaways

On-premises AI introduces security risk through the surrounding identity, context, and runtime stack, not only through the model itself.
Prompt injection, supply chain weakness, and GPU misuse become harder to contain when AI workloads sit close to sensitive internal systems.
Teams need continuous visibility, runtime guardrails, and workload inventory before AI can be treated as a governed production service.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		AI runtime misuse and guardrails map to agentic AI control boundaries.
OWASP Non-Human Identity Top 10	NHI-03	AI workloads rely on machine identities, secrets, and service permissions.
NIST Zero Trust (SP 800-207)	PR.AC-4	Continuous verification fits context-aware AI request paths and internal services.

Inventory workload identities and rotate credentials that allow AI systems to reach data or tools.

Key terms

AI workload identity: The identity assigned to a model, service, or application component so it can authenticate to data sources, tools, and platforms. In AI environments, this identity often becomes the real control point because the model’s behaviour only matters when it can reach something useful.
Prompt injection: A manipulation technique that uses crafted input to alter how an AI system interprets instructions or context. In production, it becomes a governance problem when the injected content can influence actions, data access, or tool calls inside a business process.
Runtime guardrail: A control that checks AI inputs, outputs, or actions while the workload is running. It is more than content filtering because it is meant to stop unsafe behaviour at the moment it would affect data, tools, or users.
Context service: A service that gathers internal information, such as account data or business rules, and attaches it to an AI request. It improves usefulness, but it also expands the trust boundary because the model receives more sensitive context than a public assistant would.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or governance in your organisation, it is worth exploring.

This post draws on content published by Aqua Security: Secure AI Infrastructure On-Premises from Day One. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-09-30.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org