What Is Custom training job? Definition & Examples

Expanded Definition

A custom training job is a managed AI workload that executes user-supplied code, scripts, or containers under platform-controlled infrastructure. In NHI and agentic AI environments, the key issue is not just model training. It is the fact that the job often runs with credentials, network reach, storage access, and compute privileges that can outlast the task itself.

Definitions vary across vendors on whether fine-tuning, batch inference, evaluation pipelines, and reinforcement learning jobs all count as custom training jobs. The practical security boundary is clearer: if an operator can upload executable logic into a managed service, that path deserves the same scrutiny as any privileged workload. Guidance from the NIST Cybersecurity Framework 2.0 is useful here because it anchors the discussion in access control, monitoring, and supply chain discipline rather than product labels.

Custom training jobs are distinct from ordinary data processing because they can introduce model artifacts, callbacks, package imports, and cloud credentials into a single execution chain. The most common misapplication is treating them as low-risk analytics tasks, which occurs when teams allow broad submission rights without sandboxing or review.

Examples and Use Cases

Implementing custom training jobs rigorously often introduces approval overhead and environment hardening work, requiring organisations to weigh developer speed against the blast radius of privileged execution.

A data science team submits a container to fine-tune a model and mounts object storage containing prompts, labels, and secret-laden notebooks, creating a path for credential exposure if the image is tampered with.

An AI platform allows researchers to run evaluation code with service account tokens. If those tokens are not scoped, the job can access registries or buckets far beyond the training dataset.

A startup uses a managed GPU service for reinforcement learning and permits internet egress. That design makes dependency hijacking and callback exfiltration part of the threat model.

A compromised build pipeline injects a malicious training container. The issue becomes visible only after unusual API activity appears in logs, similar to patterns discussed in the DeepSeek breach research.

Teams following NIST Cybersecurity Framework 2.0 principles often separate training roles from deployment roles so that job submission cannot also grant model release authority.

This term is especially relevant when a platform supports ad hoc notebooks, custom images, or user-defined entry points that blur the line between experimentation and production execution.

Why It Matters in NHI Security

Custom training jobs matter because they are a privileged execution surface that can turn normal experimentation into identity abuse. If a job is allowed to inherit cloud permissions, access secrets, or call internal APIs, an attacker who controls that job can pivot from model work into infrastructure compromise. This is why NHI Management Group treats the job boundary as part of the identity plane, not just the compute plane.

That concern is amplified by real-world secret exposure patterns. In The State of Secrets in AppSec, GitGuardian and CyberArk report that only 44% of developers follow secrets management best practices, while organisations average six distinct secrets manager instances. Fragmentation like that increases the chance that a custom job inherits credentials nobody intended to expose. The same report also notes that 43% of security professionals worry AI systems may learn and reproduce sensitive patterns from codebases.

For threat response, the lesson is simple: limit who can create jobs, isolate runtime identities, and review every mounted secret and outbound path. Organisations typically encounter the impact only after a suspicious container run, credential leak, or unexplained model output, at which point custom training job governance becomes operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A03	Custom jobs can execute untrusted code and tool use, a core agentic attack surface.
OWASP Non-Human Identity Top 10	NHI-02	Job runtimes often inherit secrets and service identities, creating NHI abuse paths.
NIST CSF 2.0	PR.AC-4	Least-privilege access is central when users can create privileged training workloads.

Scope job credentials tightly and prevent long-lived identity reuse across training runs.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Custom training job

Expanded Definition

Examples and Use Cases

Why It Matters in NHI Security

Standards & Framework Alignment

Related resources from NHI Mgmt Group