Subscribe to the Non-Human & AI Identity Journal
Architecture & Implementation Patterns

Custom training job

← Back to Glossary
By NHI Mgmt Group Updated June 23, 2026 Domain: Architecture & Implementation Patterns

An AI workload that runs user-defined code or containers inside a managed environment. It is useful because it provides flexibility, but it also creates a privileged execution path. When creation rights are too broad, it can become the entry point for code execution and identity abuse.

Expanded Definition

A custom training job is a managed AI workload that executes user-supplied code, scripts, or containers under platform-controlled infrastructure. In NHI and agentic AI environments, the key issue is not just model training. It is the fact that the job often runs with credentials, network reach, storage access, and compute privileges that can outlast the task itself.

Definitions vary across vendors on whether fine-tuning, batch inference, evaluation pipelines, and reinforcement learning jobs all count as custom training jobs. The practical security boundary is clearer: if an operator can upload executable logic into a managed service, that path deserves the same scrutiny as any privileged workload. Guidance from the NIST Cybersecurity Framework 2.0 is useful here because it anchors the discussion in access control, monitoring, and supply chain discipline rather than product labels.

Custom training jobs are distinct from ordinary data processing because they can introduce model artifacts, callbacks, package imports, and cloud credentials into a single execution chain. The most common misapplication is treating them as low-risk analytics tasks, which occurs when teams allow broad submission rights without sandboxing or review.

Examples and Use Cases

Implementing custom training jobs rigorously often introduces approval overhead and environment hardening work, requiring organisations to weigh developer speed against the blast radius of privileged execution.

  • A data science team submits a container to fine-tune a model and mounts object storage containing prompts, labels, and secret-laden notebooks, creating a path for credential exposure if the image is tampered with.
  • An AI platform allows researchers to run evaluation code with service account tokens. If those tokens are not scoped, the job can access registries or buckets far beyond the training dataset.
  • A startup uses a managed GPU service for reinforcement learning and permits internet egress. That design makes dependency hijacking and callback exfiltration part of the threat model.
  • A compromised build pipeline injects a malicious training container. The issue becomes visible only after unusual API activity appears in logs, similar to patterns discussed in the DeepSeek breach research.
  • Teams following NIST Cybersecurity Framework 2.0 principles often separate training roles from deployment roles so that job submission cannot also grant model release authority.

This term is especially relevant when a platform supports ad hoc notebooks, custom images, or user-defined entry points that blur the line between experimentation and production execution.

Why It Matters in NHI Security

Custom training jobs matter because they are a privileged execution surface that can turn normal experimentation into identity abuse. If a job is allowed to inherit cloud permissions, access secrets, or call internal APIs, an attacker who controls that job can pivot from model work into infrastructure compromise. This is why NHI Management Group treats the job boundary as part of the identity plane, not just the compute plane.

That concern is amplified by real-world secret exposure patterns. In The State of Secrets in AppSec, GitGuardian and CyberArk report that only 44% of developers follow secrets management best practices, while organisations average six distinct secrets manager instances. Fragmentation like that increases the chance that a custom job inherits credentials nobody intended to expose. The same report also notes that 43% of security professionals worry AI systems may learn and reproduce sensitive patterns from codebases.

For threat response, the lesson is simple: limit who can create jobs, isolate runtime identities, and review every mounted secret and outbound path. Organisations typically encounter the impact only after a suspicious container run, credential leak, or unexplained model output, at which point custom training job governance becomes operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Agentic AI Top 10A03Custom jobs can execute untrusted code and tool use, a core agentic attack surface.
OWASP Non-Human Identity Top 10NHI-02Job runtimes often inherit secrets and service identities, creating NHI abuse paths.
NIST CSF 2.0PR.AC-4Least-privilege access is central when users can create privileged training workloads.

Scope job credentials tightly and prevent long-lived identity reuse across training runs.

NHIMG Editorial Note
Reviewed and updated by the NHIMG editorial team on June 23, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org