Data and AI lifecycle security is becoming an AppSec blind spot

By NHI Mgmt Group Editorial TeamPublished 2025-01-23Domain: Agentic AI & NHIsSource: Noma Security

TL;DR: AI applications now depend on a separate Data and AI Lifecycle with distinct data, notebook, pipeline, registry, and runtime stages, and the source article argues that this creates a security blind spot for AppSec teams, according to Noma Security. The governance gap is widening because these assets behave like non-human identities with persistent access and tool reach that conventional SDLC controls were never built to track.

At a glance

What this is: This is an analysis of the Data and AI Lifecycle and its unique security risks, with the central finding that AI development and runtime create a separate blind spot for AppSec and identity governance.

Why it matters: It matters because data pipelines, notebooks, model registries, and runtime services all behave like non-human identities with access paths that IAM teams must now govern.

👉 Read Noma Security's analysis of the Data and AI lifecycle security gap

Context

The Data and AI Lifecycle is the set of stages, tools, and runtime systems that support AI applications, and it does not map cleanly to the standard software development lifecycle. That matters for non-human identity governance because notebooks, pipelines, model registries, and model-serving endpoints all hold credentials, data access, and execution authority that can outlive the human workflow that created them.

For IAM, the practical problem is not only model risk. It is the identity sprawl created by AI tooling, where service accounts, API keys, tokens, and embedded model access permissions are distributed across data preparation, training, deployment, and runtime. The article describes a common enterprise condition rather than an edge case: AI work is increasingly central, but the access model around it remains fragmented.

Key questions

Q: How should security teams govern AI applications that span notebooks, pipelines, and runtime services?

A: Security teams should govern AI applications as a chain of non-human identities, not as a single system. Each notebook, pipeline job, registry entry, and serving endpoint needs its own owner, entitlements, and rotation policy. The key is to separate experimentation access from production access so privileges do not drift as models move into service.

Q: What is the difference between SDLC security and Data and AI lifecycle security?

A: SDLC security focuses on code and application release paths, while Data and AI lifecycle security must also cover data preparation, model training, artifact lineage, and runtime behavior. AI systems depend on datasets, notebooks, and orchestration tools that can carry credentials and access rights outside the normal software path. That makes identity governance and secrets control more central than in conventional development.

Q: Why do AI pipelines and model registries create governance risk?

A: AI pipelines and model registries create governance risk because they control what data is used, what model version is deployed, and which jobs can execute automatically. If those systems are overprivileged or poorly owned, they can become durable machine access points. Security teams should treat them as privileged control planes and review them with the same discipline used for administrative access.

Q: Should organisations use just-in-time access for AI development environments?

A: Yes, where the workflow allows it. Just-in-time access reduces the chance that notebooks, pipelines, and serving environments retain standing privileges after a task ends. It is especially useful for shared experimentation environments and production-adjacent tooling, where long-lived access tends to become invisible over time.

Technical breakdown

Why the Data and AI lifecycle creates a separate identity plane

The Data and AI Lifecycle includes distinct systems for data preparation, model training, deployment, and runtime operations. Each stage uses its own tools and credentials, such as notebooks, data pipelines, registries, container platforms, and inference APIs. That makes the AI stack a separate identity plane, not just another application tier. Unlike a normal SDLC, permissions often shift between humans, automation, and model services as work moves from experimentation to production. The security risk is that identity context gets lost between those handoffs, so access granted for analysis can survive into production paths or shared environments.

Practical implication: Treat every AI workflow stage as a distinct identity boundary and inventory the accounts, secrets, and tokens that move across it.

How notebooks, pipelines, and model registries expand NHI exposure

Jupyter notebooks, orchestration jobs, and model registries are operationally useful because they connect data, code, and model artifacts. They are also high-risk from an NHI perspective because they often require direct access to sensitive datasets, training environments, and model metadata. A notebook can hold credentials, a pipeline can execute on schedule without human review, and a registry can preserve lineage and versions long after a project changes hands. The result is durable machine access with weak ownership. When that access is not tied to lifecycle controls, the environment accumulates secrets, overprivileged service accounts, and stale entitlements.

Practical implication: Map every notebook, pipeline, and registry to an owner, a purpose, and a rotation or retirement schedule.

Why AI runtime operations change the access control problem

AI runtime is where models act as an inference service or, in agentic systems, as an autonomous actor with tool access. That shifts the control question from simple authentication to constrained execution authority. Prompts, memory, retrieval data, and connected tools can all influence what the runtime can do, and misconfiguration in any of those layers can expose data or trigger unintended actions. For security teams, the important point is that runtime governance must include both identity and behavior. Traditional perimeter checks do not tell you whether a model or agent should still have access to a downstream system.

Practical implication: Apply least privilege and continuous authorization to AI runtime tools, retrieval sources, and service endpoints.

Threat narrative

Attacker objective: The attacker aims to compromise AI workflows so they can steal data, manipulate model behavior, or reuse machine credentials for broader access.

Entry occurs through exposed notebooks, pipeline jobs, or embedded secrets that give an attacker a foothold in the AI workflow.
Escalation follows when overprivileged service accounts or model-serving credentials allow movement from experimentation environments into production data and inference systems.
Impact is achieved when the attacker can alter model behavior, exfiltrate sensitive training data, or abuse AI infrastructure for unauthorized access.

Reviewdog GitHub Action supply chain attack — reviewdog/action-setup GitHub Action supply chain attack exposed secrets.
CI/CD pipeline exploitation case study — full server takeover via exposed .git directory and mismanaged CI/CD pipeline secrets.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

The Data and AI Lifecycle is now an identity governance problem, not just an AppSec problem. The article correctly frames AI tooling as a separate operational lifecycle, but the deeper issue is that each stage creates persistent non-human identities with their own trust relationships. That makes ownership, entitlement review, and lifecycle enforcement central controls, not optional hygiene. Practitioners should manage AI systems as identity-bearing workloads from the start.

Notebook-first development creates ephemeral trust debt. Data science workflows often begin in flexible, high-access environments that later feed production models and services. The trust granted during experimentation rarely gets reduced with the same urgency at deployment, which leaves behind access debt in the form of stale secrets and overprivileged automation. Teams should assume every notebook path can become a long-lived attack path if lifecycle controls are missing.

Model registries and pipeline orchestration are governance chokepoints. These systems preserve version history, lineage, and execution flow, which makes them useful for auditability but also attractive targets for privilege abuse. If attackers or insiders can tamper with pipeline jobs or registry metadata, they can influence what code, data, or model version reaches production. Practitioners should treat registries and pipelines as privileged control points.

Runtime is where AI identity control becomes measurable. At deployment time, the question is no longer whether the model was built correctly, but whether its access is constrained, observable, and reversible. That is the point at which IAM, secrets management, and policy enforcement must converge. Security teams should use runtime controls to reduce blast radius rather than relying on development-time reviews alone.

From our research:
92% agree governing AI agents is critical to enterprise security, yet only 44% have implemented any policies to do so, according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
For a governance lens that maps machine access across its full life cycle, see NHI Lifecycle Management Guide.

What this signals

Data and AI lifecycle governance will increasingly converge with NHI management. As AI tooling spreads across notebooks, pipelines, registries, and runtime services, the real control problem becomes identity ownership across machine-driven workflows. Teams should expect security reviews to shift from model-centric questions to entitlement-centric ones, especially where credentials move automatically between environments. For a broader control baseline, align lifecycle reviews with OWASP Non-Human Identity Top 10.

With 80% of organisations already reporting AI agents acting beyond their intended scope in NHIMG research, the enterprise assumption that machine access is inherently predictable is already broken. That means AI programme risk is not just a model quality issue, it is a governance issue tied to standing privilege, stale secrets, and weak auditability.

Ephemeral credential trust debt: AI teams often inherit short-lived experimentation access that quietly hardens into production privilege. Security leaders should watch for environments where notebooks, pipelines, and service endpoints still share the same trust root, because those are the places where attack paths multiply fastest. Use NHI Lifecycle Management Guide to reset ownership, rotation, and offboarding expectations.

For practitioners

Inventory AI identities across the lifecycle Catalogue notebooks, pipelines, registries, model servers, API keys, and service accounts as distinct non-human identities with an owner and purpose. Use the inventory to identify where access crosses from development into production and where credentials are shared across stages.
Separate experimentation from production access Require distinct credentials and policies for data science experimentation, training, and runtime operations. Do not reuse the same secrets or service accounts across notebooks, scheduled jobs, and serving endpoints.
Rotate and retire AI secrets on a lifecycle schedule Tie rotation to project milestones, model releases, and offboarding events rather than leaving keys embedded in notebooks or pipeline configs. The NHI Lifecycle Management Guide is the right reference point for provisioning, rotation, and retirement discipline.
Restrict model-serving and pipeline privileges Limit inference services and orchestration jobs to the minimum data stores, model assets, and downstream APIs they actually need. Use short-lived access where possible and review standing permissions regularly.
Audit runtime access and lineage continuously Log which datasets, models, prompts, and tools are touched during execution so you can reconstruct misuse after an incident. Pair that telemetry with periodic reviews of model lineage and pipeline execution history.

Key takeaways

The Data and AI Lifecycle introduces a separate identity surface that conventional SDLC controls do not fully cover.
Notebooks, pipelines, registries, and serving endpoints can all behave like high-risk non-human identities when ownership and rotation are unclear.
Security teams should govern AI systems through lifecycle-based access control, short-lived privileges, and continuous auditability.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-03	AI tooling often retains standing access and stale secrets across lifecycle stages.
NIST CSF 2.0	PR.AC-4	Least privilege is central when machine workflows span multiple AI lifecycle stages.
NIST AI RMF		AI governance requires ownership, monitoring, and accountability for autonomous workflow behavior.

Review AI notebooks, pipelines, and runtime services for secret rotation and remove standing access.

Key terms

Data and AI Lifecycle: The Data and AI Lifecycle is the end-to-end set of processes used to prepare data, train models, deploy AI systems, and run them in production. It includes tooling and runtime layers that sit alongside, but not inside, the conventional software lifecycle, which is why it creates separate governance and identity risks.
Model Registry: A model registry is a system used to store, version, and track machine learning models and their metadata across development and deployment. It gives teams lineage and version control, but it also becomes a privileged control point because the wrong model version or metadata can reach production if access is not tightly governed.
AI Runtime Operations: AI runtime operations are the live execution processes that let a model respond to prompts, process inputs, and interact with data or tools. This phase matters for security because runtime access determines what the model can reach, what it can expose, and how much damage a compromised workflow can do.
Notebook Environment: A notebook environment is an interactive workspace where data scientists explore data, write code, and test models in a single place. These environments are powerful but sensitive because they often connect directly to datasets and credentials, making them common sources of standing access and hidden secret exposure.

Deepen your knowledge

Data and AI lifecycle security is a core topic in our NHI Foundation Level course, the industry's only accredited NHI security programme. If your team is trying to govern AI workflows that blend code, data, and runtime access, it is worth exploring.

This post draws on content published by Noma Security: Data and AI Lifecycle 101. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-01-23.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org