Amazon Nova Forge changes who can build custom foundation models

By NHI Mgmt Group Editorial TeamPublished 2025-12-04Domain: Agentic AI & NHIsSource: WorkOS

TL;DR: AWS’s Nova Forge lowers the barrier to custom foundation model training by combining customer data with structured training checkpoints, allowing enterprises to build private Nova-based models without full frontier-lab scale, according to WorkOS. The real shift is that proprietary data can now shape model behavior earlier, which raises governance, safety, and lock-in questions for identity and AI teams.

At a glance

What this is: Amazon Nova Forge introduces an open training approach that lets enterprises build private custom foundation models using their own data during training, not just after it.

Why it matters: IAM and security teams need to understand this because custom model training expands who can shape AI behaviour, how proprietary data is handled, and where governance must follow the model lifecycle.

By the numbers:

Nimbus Therapeutics saw 20-50% improvement over Claude Sonnet 4 on property prediction benchmarks using Nova 2 Lite through Forge.

👉 Read WorkOS’s analysis of Amazon Nova Forge and custom foundation models

Context

Amazon Nova Forge changes the economics of custom foundation model training by moving enterprises closer to domain-specific model ownership. The primary issue is not just cost, but control over how proprietary data influences model behaviour during training, which has direct implications for AI governance, access boundaries, and data handling.

For identity and security teams, the important question is where the governance boundary sits when internal data is no longer only consumed by an AI model but helps shape the model itself. That shifts the control problem from prompt-time usage to training-time provenance, dataset approval, and post-training behaviour validation.

Key questions

Q: How should security teams govern custom foundation model training on proprietary data?

A: Security teams should treat custom foundation model training as a governed data and identity workflow, not a one-time ML project. That means approving training inputs, defining who can trigger runs, validating reward signals, and testing post-training behaviour before deployment. The key control is lineage, because the model inherits behaviour from the data used to shape it.

Q: What risks appear when enterprises train models on internal data instead of only fine-tuning them?

A: The main risk is that internal data stops being passive input and becomes part of the model’s learned behaviour. That raises concerns around leakage, policy drift, accidental overfitting to sensitive patterns, and weaker portability. Teams need to govern what enters training, not just what leaves the model at inference time.

Q: When does custom model consolidation become a governance concern?

A: Consolidation becomes a governance concern when multiple specialised models are replaced by one custom model without stronger validation and ownership. A single model can simplify operations, but it also increases blast radius if policy, safety, or data quality is wrong. Governance should scale with concentration, not with convenience.

Q: What should organisations check before relying on a managed training platform for custom AI models?

A: Organisations should check deployment boundaries, export limits, logging coverage, approval controls, and the ability to validate model behaviour independently. If the platform keeps weights and deployment inside a single ecosystem, the trade-off is less portability in exchange for more operational consistency. That trade-off should be explicit before adoption scales.

Technical breakdown

Open training checkpoints and data mixing

Nova Forge’s core mechanism is structured access to checkpoints across pre-training, mid-training, and post-training. Rather than forcing enterprises to rely only on supervised fine-tuning, AWS mixes customer data with its own training distributions so the model can absorb domain knowledge without collapsing its base capabilities. That matters because naive continued pre-training can cause catastrophic forgetting, where new data overwrites useful general competence. The architecture therefore moves model customization earlier in the lifecycle and makes the training corpus part of the control surface.

Practical implication: treat training data approval, lineage, and retention as part of model governance, not just data governance.

Reinforcement fine-tuning and reward shaping

Forge also supports reinforcement fine-tuning, where customers define reward functions or programmatic quality signals and SageMaker uses those signals to shape model behaviour. This is different from ordinary prompt tuning because the model is not only learning from examples, it is learning which outputs the enterprise prefers under its own scoring rules. That creates a stronger coupling between business policy and model behaviour, but it also makes the choice of reward signal a governance decision. Bad reward design can hard-code the wrong incentives at scale.

Practical implication: validate reward functions and evaluation criteria before they become embedded in production model behaviour.

Deployment boundaries and Bedrock lock-in

Forge models are deployed only on Amazon Bedrock, and AWS does not provide raw model weights. That means the trained model remains within a managed delivery boundary, which simplifies some operational controls while tightening platform dependence. From an identity perspective, the access model extends beyond users and secrets into training jobs, model artefacts, logging, quotas, and the policies that govern who can initiate or modify training runs. The result is a more complete lifecycle, but also a narrower portability path.

Practical implication: map who can start, alter, export, or validate training runs before custom model programmes expand.

DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.
Schneider Electric credentials breach — exposed credentials gave attackers access to Schneider Electric Jira, exfiltrating 40GB.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Custom model training is now an identity and governance problem, not just an ML problem. When proprietary data shapes the model during training, the control boundary moves upstream from inference to lineage, access, and approval of the training corpus. That means model risk is inseparable from who can authorise data inclusion, who can initiate training, and who can validate resulting behaviour. Practitioners should treat training pipelines as governed identity workflows, not engineering experiments.

The most important shift is the collapse of the old assumption that AI only consumes data. Traditional governance assumes the model is a downstream consumer of curated inputs, with least-privilege decisions made around prompts, APIs, and runtime access. Forge breaks that assumption because data now becomes part of the model’s internal representation during training. The implication is that provisioning-time thinking is no longer enough when the thing being governed is partially created from the data itself.

Training data provenance: the assumption that model behaviour can be separated from data approval fails when the enterprise’s own corpus becomes training material. That assumption was designed for systems where sensitive data could be classified and consumed without altering the system itself. It fails when custom training embeds organisational knowledge into model weights, making approval, redaction, and retention decisions part of the model’s identity lifecycle. Practitioners must rethink how data access, model ownership, and behavioural accountability intersect.

Model consolidation changes the operational risk profile of AI estates. Reddit’s move from multiple specialised models toward a single Nova-based model shows the pull toward consolidation, but consolidation also increases blast radius when governance is weak. Fewer models can simplify operations, yet they also concentrate dependency, policy, and validation requirements into one place. The practical conclusion is that simplification only helps if the organisation can actually govern the unified model more rigorously than the tools it replaced.

Platform convenience will compete directly with portability and control. A managed training path inside Bedrock lowers adoption friction for enterprises already invested in AWS, but it also narrows where models can live and how much of the lifecycle can be independently controlled. That trade-off resembles other identity decisions where convenience hides long-term dependency. Practitioners should evaluate whether the governance gains of a managed training boundary outweigh the strategic cost of reduced portability.

From our research:
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
Only 44% of developers are reported to follow security best practices for secrets management, exposing a significant developer behaviour gap.
That same governance gap is why teams should review the Ultimate Guide to NHIs , Key Research and Survey Results before expanding custom AI training programmes.

What this signals

Custom training makes data governance part of model governance, which means identity teams need to think about training-time approvals the same way they think about privileged access approvals. The practical risk is not only who can use the model, but who can shape the model using sensitive source material. With 27 days the average time to remediate a leaked secret, per The State of Secrets in AppSec, organisations are already slow at controlling sensitive inputs once they escape their intended boundary.

Model consolidation will pressure programme owners to prove they can govern fewer, more powerful systems rather than many smaller ones. That shift changes the operating model for review, validation, and rollback. Teams that cannot track data provenance, training access, and behavioural drift will find that simplification in the model estate simply moves complexity into governance.

Identity programmes should expect training pipelines to become part of their own control surface, especially where proprietary data and domain expertise are strategic assets. The right question is no longer whether AI can be customised, but whether the organisation can prove who authorised the customisation and what knowledge was embedded. For broader NHI context, see Ultimate Guide to NHIs , Why NHI Security Matters Now.

For practitioners

Define training-data approval gates Require explicit approval for any dataset that will influence model training, including proprietary documents, moderation logs, and domain corpora. Tie approval to data classification, retention limits, and business owner sign-off before a training run can begin.
Assign ownership for model lifecycle decisions Create named accountability for who can start training, change reward functions, approve checkpoints, and validate output behaviour. Separate these permissions so no single team can both shape and approve the model unchecked.
Test for behavioural drift after customisation Run evaluation suites that compare base-model performance against custom-trained performance on safety, refusal quality, and domain accuracy. Review drift not only as a quality issue but as a governance signal that the training corpus changed the model’s operating profile.
Review platform dependence before scaling adoption Document where models can run, how artefacts are retained, and what happens if the organisation needs to move training elsewhere. Use that review to decide whether the managed boundary is acceptable for regulated or multi-cloud environments.

Key takeaways

Nova Forge shifts custom model governance upstream by making proprietary training data part of the model itself, not just a source for later fine-tuning.
The most material risk is not model quality alone but the governance burden created when access, lineage, and behaviour all converge in the training lifecycle.
Practitioners should separate approval, validation, and deployment rights before custom model programmes scale beyond a narrow pilot.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST AI RMF		Custom model training needs governance over data, evaluation, and accountability.
NIST CSF 2.0	GV.OC-03	Business context matters because proprietary data and model ownership change risk appetite.
OWASP Agentic AI Top 10		Training-time behaviour shaping and runtime model use create agentic risk-adjacent controls.

Assign model governance owners and require approval for training data, reward functions, and release validation.

Key terms

Custom Foundation Model Training: Training a base model with organisation-specific data so the resulting model learns domain knowledge during the training process. In identity terms, this is a governed lifecycle activity because data access, approval, and validation decisions directly influence model behaviour and accountability.
Data Mixing Pipeline: A training mechanism that blends customer data with baseline training distributions to reduce catastrophic forgetting. It matters because the model keeps general competence while absorbing local knowledge, which makes data selection and provenance controls part of the model risk boundary.
Catastrophic Forgetting: The loss of previously learned capability when a model is retrained too aggressively on new data. In practice, it is the technical reason enterprises need structured checkpoints and controlled training input, especially when they want domain adaptation without damaging core performance.
Reward Function: A scoring rule that tells a model which outputs are preferred during reinforcement fine-tuning. It becomes a governance object because the reward design can encode business policy, safety expectations, or unintended bias directly into the model’s behaviour.

Deepen your knowledge

Custom foundation model training and model lifecycle governance are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are extending identity governance into AI training pipelines, it is worth exploring.

This post draws on content published by WorkOS: Amazon Nova Forge and the shift to custom foundation models. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-12-04.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org