When should organisations allow AI in development workflows?

Why This Matters for Security Teams

AI belongs in development workflows only when the organisation can bound what the model sees, what it can do, and how outputs are checked before code ships. The issue is not whether AI is useful, but whether the workflow exposes secrets, internal prompts, or privileged context that would be unacceptable if copied, retained, or reproduced. That concern is consistent with the findings in The State of Secrets in AppSec, where 43% of security professionals were already worried about AI systems learning and reproducing sensitive information patterns from codebases.

Security teams also need to treat AI-enabled development as an access-control problem, not just a productivity decision. The NIST Cybersecurity Framework 2.0 places clear emphasis on governance, access management, and risk-based protection, which maps well to gating AI use by data sensitivity and release controls. In practice, many security teams encounter unsafe AI use only after a secret, token, or proprietary fragment has already been exposed through a developer workflow.

How It Works in Practice

The safe pattern is to define AI usage tiers by data class and task type. Low-risk use cases include summarising public code, refactoring non-sensitive snippets, drafting tests from sanitized inputs, or helping with documentation. Higher-risk use cases need stronger controls: secret scanning before any prompt is sent, redaction of internal identifiers, model access restricted to approved tools, and mandatory human review before code merges. If the workflow cannot prevent secrets from entering the prompt, it is not ready for broad deployment.

This is where identity and policy controls matter. A development assistant should not have broad, persistent access to repositories, ticket systems, or infrastructure credentials. Instead, access should be tied to workload identity, short-lived tokens, and explicit approval paths. Current guidance suggests using least privilege, but the implementation detail is what makes or breaks the control: scope the model to a task, expire access quickly, and evaluate policy at request time rather than relying on a one-time role grant. The NIST Cybersecurity Framework 2.0 and DeepSeek breach both reinforce the same operational lesson: broad context and broad access create broad blast radius.

Allow AI on public or sanitized code first, then expand only after review gates are working.

Use JIT access and revoke credentials when the task ends, not at the end of the week.

Block prompts that contain secrets, customer data, or production credentials.

Require an explicit human check for release-bound code, security-sensitive logic, and dependency changes.

Log model interactions so you can reconstruct what was shared and why.

These controls tend to break down in fast-moving developer environments where engineers can copy production context into prompts to save time because convenience usually outruns review discipline.

Common Variations and Edge Cases

Tighter AI controls often increase friction, requiring organisations to balance developer speed against the cost of incident response and remediation. That tradeoff is real, especially when teams want AI help inside CI/CD pipelines, incident response tooling, or pair-programming environments.

The strongest case for allowing AI is when inputs are already sanitized, outputs are non-authoritative, and the workflow has a hard stop before release. The weakest case is anything involving live secrets, privileged cloud credentials, regulated data, or proprietary architecture details. In those environments, best practice is evolving, but there is no universal standard that makes unrestricted AI use acceptable. The The State of Secrets in AppSec research is useful here because it highlights how fragmented secrets management and slow remediation can turn a small exposure into a lasting one.

Teams should also be cautious with tool-using agents that can move from code assistance into deployment or infrastructure actions. Once an AI can chain tools, the question shifts from “Can it write code?” to “Can it reach anything sensitive if prompted or misled?” That is why NIST Cybersecurity Framework 2.0-style governance, plus narrow-scoped review, remains the safer operating model until the environment proves it can absorb mistakes without exposing secrets or production systems.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Agentic AI must be gated before it can access sensitive workflows or tools.
CSA MAESTRO	GOV-1	Governance is needed to decide when AI use is acceptable in development.
NIST AI RMF		AI risk management fits the decision to allow or delay AI in dev workflows.

Assess data sensitivity, output risk, and oversight before permitting AI in production-adjacent work.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

When should organisations allow AI in development workflows?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group