Subscribe to the Non-Human & AI Identity Journal

Generated code execution

A runtime model where the agent writes code from a tool schema and the environment executes it inside a sandbox. This compresses repeated actions into one execution block, which improves efficiency but requires stronger governance around what the code can do once it starts running.

Expanded Definition

Generated code execution is a runtime pattern in which an AI agent produces executable code from a tool schema, then the environment runs that code inside a sandbox. It sits between simple function calling and full autonomous execution because the agent is not merely selecting actions, it is synthesising code that can chain multiple actions in one block.

In NHI and agentic AI security, the distinction matters because code generation expands the agent’s effective authority. A tool schema defines the allowed interface, while the sandbox defines the containment boundary. That boundary must be treated as a security control, not just an engineering convenience. Guidance across the industry is still evolving, especially on how much code should be allowed to persist, what network access it should retain, and how to audit generated logic after execution. The most mature practice is to constrain the runtime, validate the schema, and log every input, output, and side effect in a way that supports later review. For broader governance context, the NIST Cybersecurity Framework 2.0 remains useful for mapping these controls to risk management outcomes.

The most common misapplication is treating generated code execution like a harmless prompt response, which occurs when organisations allow sandboxed code to inherit broad credentials or network reach.

Examples and Use Cases

Implementing generated code execution rigorously often introduces latency and review overhead, requiring organisations to weigh automation speed against the cost of tighter containment.

  • An agent writes a short Python routine to transform multiple log files, then runs it in a restricted sandbox with no outbound internet access.
  • A support automation agent generates code to query internal systems through approved APIs, using a schema that blocks file writes and shell escapes.
  • An analyst agent creates a one-off reconciliation script for incident triage, with execution limited to ephemeral storage and read-only service credentials.
  • A developer workflow uses generated code to combine several approved tools into a single transaction, reducing repeated tool calls while preserving audit logging.
  • Security teams compare execution traces against the findings in Analysis of Claude Code Security to understand how agent-written code behaves in constrained environments.

When teams need a standards anchor for identity and permission boundaries around these workflows, NIST Cybersecurity Framework 2.0 helps frame the operational controls even though it does not define the runtime pattern itself.

Why It Matters in NHI Security

Generated code execution is high-risk because it compresses many privileged actions into a single run, making misuse harder to spot and faster to impact systems. If the sandbox is weak, the generated code may read secrets, call internal services, or modify records in ways the agent designer did not intend. That turns a productivity feature into a privilege amplification path. In NHI environments, the danger is compounded because service accounts, API keys, and automation tokens often already carry broad rights. NHI Mgmt Group research shows that 97% of NHIs carry excessive privileges, which makes any execution model that can bundle actions especially sensitive. Teams should therefore align the runtime with least privilege, short-lived credentials, scoped network policy, and explicit logging, and they should review whether code generation is even needed for the task. For governance depth, the Ultimate Guide to Non-Human Identities is especially relevant when evaluating how generated execution intersects with secret handling, rotation, and offboarding.

Organisations typically encounter the operational impact only after a sandbox escape, credential misuse, or unexpected data modification, at which point generated code execution becomes operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
OWASP Agentic AI Top 10 AI-03 Generated code execution expands agent authority and attack surface.
OWASP Non-Human Identity Top 10 NHI-02 Sandboxed execution still depends on secret handling and credential containment.
NIST CSF 2.0 PR.AC-4 This term depends on permission scoping and access control for runtime code.

Prevent generated code from reaching long-lived secrets and enforce least-privilege access paths.