Runtime protection for OpenAI AgentKit and AI agent guardrails

By NHI Mgmt Group Editorial TeamPublished 2025-11-04Domain: Agentic AI & NHIsSource: Zenity

TL;DR: AI agent development is easier with AgentKit but the attack surface expands through connectors, workflows, prompt injection, and credential leakage, according to Zenity. The core issue is that probabilistic guardrails do not hold when agent behavior must be enforced deterministically.

At a glance

What this is: This is a vendor analysis of runtime protection for OpenAI AgentKit, with the key finding that native guardrails can miss prompt injection, secret exposure, and unsafe agent actions.

Why it matters: It matters because IAM, NHI, and AI governance teams need controls that enforce policy at runtime when agents can reach tools, tokens, and data in real workflows.

👉 Read Zenity's analysis of runtime protection for OpenAI AgentKit

Context

Runtime enforcement is the missing layer when AI agents can act on connectors, workflows, and embedded credentials faster than humans can review their outputs. In practice, the governance gap is not whether an agent can be built, but whether its actions can be constrained when prompt injection, token leakage, or unsafe tool use appears during execution.

This is an identity problem as much as a model problem. Once an agent can invoke tools and handle secrets inside enterprise systems, policy has to operate at the point of action, not after the fact. That is why the article belongs in the AI agent identity and NHI security conversation rather than only in the product category.

Key questions

Q: How should security teams govern AI agents that can call tools and access data?

A: Treat the agent as a governed identity with explicit task scope, not as a chat surface with broad implicit trust. Security teams should constrain connector permissions, separate data access from response generation, and require runtime enforcement before tool use or disclosure occurs. If the agent can reach secrets or regulated data, policy must operate at the execution layer, not only in review or logging.

Q: Why do native guardrails fail against prompt injection in AI agents?

A: Native guardrails often classify text rather than control execution, so they can miss attacks that manipulate the agent’s next action instead of its visible output. Prompt injection, encoded instructions, and multi-turn coercion exploit that gap. The practical answer is to enforce policy deterministically at runtime so the agent cannot carry out an unsafe tool call even when the text looks benign.

Q: What breaks when AI agents reuse broad OAuth scopes and tokens?

A: Broad scopes turn the agent into a high-blast-radius identity that can expose data, invoke tools, or move into systems it was never meant to touch. Reused tokens and over-scoped grants also make revocation and audit harder because the access looks legitimate from the outside. Teams should assume every extra scope increases the number of ways an agent can be induced to do harm.

Q: Who is accountable when an AI agent leaks secrets or violates policy?

A: Accountability sits with the organisation that defined the agent’s permissions, workflows, and enforcement model, not with the model itself. If runtime controls are absent, the failure is governance design, not just user behaviour. Teams should assign an owner for agent policy, connector scope, and incident response so there is a clear path from risk detection to containment.

Technical breakdown

Why native agent guardrails miss real attack patterns

Native guardrails usually score or classify content after it is generated, which is too late when the agent has already reasoned its way into a risky action. Prompt injection, multilingual obfuscation, encoded payloads, and buried instructions exploit the gap between text inspection and execution control. In AgentKit-style environments, the attacker is not only trying to change the answer but to alter the agent’s next tool call, data access, or response path. Soft controls struggle because they interpret language, while the real security decision is about whether the agent is allowed to proceed. That is a governance mismatch, not just a detection problem.

Practical implication: teams need controls that evaluate execution intent before tool use, not only content after generation.

Why connectors turn AI agents into a broader identity surface

Connectors extend an agent’s reach into SaaS, cloud, and internal data sources, which means every connector inherits the security posture of the underlying identity and authorization model. If OAuth scopes are too broad, if tokens are reused too widely, or if access is not constrained to task scope, the agent can expose data even when the model itself is behaving as designed. AgentKit lowers the barrier to deployment, but it also multiplies the number of identity relationships that security teams must govern. The hard part is not just making the agent work. It is making sure the connected identities do not create unbounded privilege chains.

Practical implication: review connector scopes, token handling, and downstream privileges as part of the agent security boundary.

Deterministic enforcement is different from probabilistic safety

The article contrasts model-driven guardrails with rule-based runtime protection. Probabilistic controls estimate whether an output looks safe, while deterministic controls enforce a policy boundary that either allows or blocks the action. That distinction matters in regulated workflows, where a single unsafe response can leak secrets, violate policy, or trigger an inappropriate downstream operation. In identity terms, this is the difference between trusting the model to behave and constraining the identity to a permitted action set. For agentic systems, enforcement has to survive adversarial prompting, not just normal usage.

Practical implication: align agent governance to explicit policy enforcement, especially where sensitive data or compliance obligations are involved.

NHI Mgmt Group analysis

Runtime AI agent protection is becoming an identity control, not a model feature. AgentKit lowers the threshold for deploying agents, but the governance problem shifts to controlling what the agent can reach, invoke, and reveal at runtime. That makes the control plane part of identity security, because the agent now operates with tool access, data access, and policy exposure that resemble a non-human identity. Practitioners should treat runtime protection as an identity boundary that decides whether execution is allowed.

Probabilistic guardrails fail because they were designed for classification, not enforcement. This article exposes the gap between telling a system what looks risky and stopping it from acting on that risk. Prompt injection and obfuscated payloads do not need to defeat every model judgment if they can reach a tool call before enforcement occurs. The implication is that governance based on post-generation review cannot be the primary control for agentic systems.

Agentic intent is the right control concept for runtime security. The article’s central insight is that teams must understand what the agent is trying to do, not only what it says. That is a more useful security abstraction than plain prompt filtering because it connects language, tool use, and downstream effect. Practitioners should anchor policy around intended action paths, not isolated text patterns.

Ephemeral trust assumptions: access decisions made at build time do not survive runtime agent behaviour. Access scoping that looks adequate on paper can collapse once the agent starts chaining connectors, reusing tokens, or adapting its response path mid-session. That assumption was designed for predictable workflows with stable execution paths. It fails when the actor can decide its own next move in response to manipulated input, and practitioners need to rethink whether provisioning-time controls still describe the real risk boundary.

Agent governance now spans discovery, posture, detection, and prevention. The article makes clear that runtime protection is only one layer in a broader lifecycle that includes visibility into deployed agents, control over their configurations, and blocking unsafe actions as they occur. That reinforces the view that AI agent security cannot be solved with a single point product. Practitioners should evaluate the whole lifecycle, not only the inline control.

From our research:
98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
For a broader control model, see OWASP Agentic AI Top 10 and map runtime enforcement to the agentic attack paths it documents.

What this signals

Runtime enforcement is quickly becoming the dividing line between agent adoption and agent governability. If organisations continue to scale AI agents without execution-layer policy, they will accumulate more identities that can call tools than they can safely observe. The practical signal is simple: inventory connectors, reduce scopes, and make sure every agent has an owner before deployment becomes normalised.

The governance gap is widening faster than most programmes can measure it, especially where teams treat agents as application features rather than identities. Security leaders should expect the next maturity jump to come from policy enforcement, auditability, and containment, not from better prompt hygiene alone.

Agentic intent: when runtime controls can distinguish between a harmless reply and a risky action path, security teams finally gain a control concept that matches how agents behave in production. That shift aligns well with OWASP Agentic AI Top 10 and the enforcement thinking behind NIST AI Risk Management Framework.

For practitioners

Map every agent connector to its underlying identity and scope Inventory which tokens, OAuth grants, API keys, and service permissions each agent can reach, then reduce scopes to the smallest task boundary that still allows the workflow to function.
Test guardrails against prompt injection and obfuscation Run adversarial exercises that include multi-turn prompt injection, encoded text, foreign-language payloads, and hidden instructions to see whether the control blocks execution or only flags it.
Separate output safety from action authorization Do not rely on the agent’s response text as a proxy for safety. Require policy decisions that evaluate whether the next tool call, data access, or external action is permitted before it executes.
Review secret exposure paths inside agent workflows Check where API keys, credentials, and access tokens can surface in responses, logs, or downstream integrations, then block disclosure before the output leaves the agent boundary.

Key takeaways

AgentKit reduces friction for building AI agents, but it also expands the identity and attack surface that security teams must govern.
Prompt injection, secret exposure, and unsafe tool use are control failures when runtime policy cannot stop execution before harm occurs.
Practitioners should move from text-based guardrails to deterministic, policy-driven enforcement at the point where agents act.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		AgentKit risk centers on prompt injection, tool misuse, and runtime enforcement gaps.
NIST AI RMF		Runtime agent governance depends on clear oversight, accountability, and policy enforcement.
NIST CSF 2.0	PR.AC-4	Agent connector scopes and access boundaries are core access control issues.

Review agent entitlements against least-privilege access and reduce unnecessary downstream scope.

Key terms

Agentic Intent: The action an AI agent is attempting to carry out, including tool use, data access, and downstream effects. In security terms, intent matters because a harmless-looking prompt can still drive a risky execution path if the runtime control only inspects text and not the planned action.
Runtime Protection: A control layer that evaluates and enforces policy while an AI agent is operating, not after the fact. It is designed to block unsafe tool calls, data leakage, or unauthorized outputs at the moment they would occur, which is essential when decisions and actions happen in the same session.
Connector Scope: The set of systems, data sources, and permissions an agent can reach through integrations such as APIs, SaaS connectors, and OAuth grants. Narrow scope reduces blast radius, while broad scope increases the chance that a manipulated agent can move from conversation into unauthorized action.
Deterministic Enforcement: A policy model that produces a clear allow or block decision based on defined rules rather than probabilistic model judgment. For AI agents, deterministic enforcement is valuable because it can stop risky execution even when language is obfuscated, multi-turn, or designed to evade classifier-based guardrails.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or programme maturity, it is worth exploring.

This post draws on content published by Zenity: Closing the Guardrail Gap, runtime protection for OpenAI AgentKit. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-11-04.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org