What should organisations do when GenAI is embedded in code and workflows?

Why This Matters for Security Teams

When generative AI is embedded in code generation, approvals, deployment scripts, or workflow automations, it stops being a novelty and becomes part of the software supply chain. That means AI output can introduce vulnerable code paths, hardcoded secrets, unsafe API calls, and prompt-injection artifacts just like any other untrusted input. NIST’s NIST AI 600-1 GenAI Profile treats these risks as governance issues, not convenience issues.

The practical mistake is to assume the model is “smart enough” to be trusted without review. In reality, AI-assisted code often looks plausible while still failing secure coding standards, dependency hygiene, and change-control requirements. NHIMG research on the State of Secrets in AppSec shows how often secrets exposure and remediation gaps persist in normal development pipelines, which makes AI-generated output even more sensitive to weak controls. In practice, many security teams encounter the unsafe output only after a build has already passed review and reached a shared repository or production workflow.

How It Works in Practice

Organisations should treat GenAI output as an untrusted dependency that enters the same governance path as third-party code, copied snippets, and external packages. That means the first line of defence is not “trust the model less,” but “apply normal engineering controls more consistently.” Review should cover syntax, logic, dependencies, secrets, data handling, authentication, and any code that invokes privileged services or external APIs.

A workable control pattern usually includes:

Static and dynamic review of AI-generated code before merge or deployment.

Secret scanning for API keys, tokens, certificates, and embedded credentials.

Prompt-injection and instruction-following checks for workflow automations that consume AI text.

Dependency inspection to catch unsafe libraries, unpinned packages, or transitive risk.

Human approval for any change that touches authentication, access control, payments, or production data paths.

This aligns with NIST guidance on secure AI lifecycle practices and the broader secure development expectations in NHIMG’s secrets research, which highlights the persistent gap between stated confidence and actual remediation performance. For teams using code assistants or agentic workflows, policy should be explicit: AI may propose, but it cannot authorise, merge, or deploy. Controls are strongest when paired with branch protections, signing, CI gates, and logging that preserve accountability across the full change path.

These controls tend to break down when AI output is pasted directly into infrastructure-as-code, CI/CD jobs, or customer-facing workflow automations because the review path becomes too short for meaningful inspection.

Common Variations and Edge Cases

Tighter review of AI-assisted work often increases developer friction and slows delivery, requiring organisations to balance speed against assurance. That tradeoff is real, but current guidance suggests the risk is highest where AI touches privileged operations, secrets, or externally reachable interfaces.

Not every GenAI use case needs the same level of control. A documentation draft is not the same as a code change that creates cloud resources, rotates credentials, or modifies access logic. Best practice is evolving, but a useful rule is to classify AI output by blast radius: low-risk text can move through lighter review, while anything that can change runtime behaviour should follow normal SDLC gates. The DeepSeek breach is a reminder that AI ecosystems can expose far more than content alone; they can reveal credentials, backend context, and operational data when controls are weak.

There is no universal standard for this yet, especially for mixed human-AI workflows, but the operational expectation is clear: if the AI output could become executable, privileged, or persistent, it needs the same scrutiny as code from any external contributor. Teams that skip this distinction usually discover the problem after an unsafe commit, a leaked secret, or an over-permissive automation has already been shipped.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A01	AI-generated code and workflows can introduce insecure actions and hidden instructions.
NIST AI RMF		AI RMF governs lifecycle risk, including validation and monitoring of AI-assisted outputs.
NIST CSF 2.0	PR.IP-1	Secure development and change control apply directly to AI-generated code in workflows.

Treat model output as untrusted, verify actions before execution, and block unsafe automation paths.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What should organisations do when GenAI is embedded in code and workflows?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group