Open models are narrowing the AI gap, but data remains the moat

By NHI Mgmt Group Editorial TeamPublished 2026-04-15Domain: Agentic AI & NHIsSource: WorkOS

TL;DR: At HumanX 2026, Fireworks AI’s Rob Ferguson argued that open models are closing the performance gap with frontier systems while enterprise data remains the real competitive moat, with model ownership becoming more valuable as scale and cost pressures rise, according to WorkOS. The strategic question is no longer whether open models can compete, but whether organisations can govern the data and identity access behind them.

At a glance

What this is: This interview argues that open models are catching up fast, but differentiated enterprise data, not model architecture, is where durable advantage now sits.

Why it matters: IAM teams need to treat model training data, code repositories, and tool access as governed identity surfaces because AI advantage now depends on who and what can reach them.

👉 Read WorkOS's interview on open models, AI ownership, and the data moat

Context

Open model adoption is increasingly an identity and governance problem, not just a model-selection problem. As enterprises move from using frontier models to owning and fine-tuning their own, the real control point becomes who can access training data, code repositories, and model weights.

The interview’s central claim is that model quality is converging while data access remains the differentiator. That shifts attention from architecture debates to the identity, access, and lifecycle controls around the data that shapes model behaviour.

Key questions

Q: How should security teams govern access to AI training data?

A: Security teams should treat AI training data as a privileged asset and apply least privilege, ownership, and review cycles to every identity that can read, export, or transform it. The focus should be on the pipelines that create model behaviour, not just the model runtime. If data access is broad, the AI programme inherits unnecessary exposure.

Q: Why does enterprise data matter more than model architecture for AI strategy?

A: Enterprise data matters because general model capabilities are converging, while proprietary data remains the durable source of differentiation. A company can buy or run similar models to its peers, but it cannot easily replicate its internal context, workflows, and code. That makes data access governance the real strategic control point.

Q: What breaks when AI model ownership is separated from access governance?

A: Model ownership becomes superficial when the organisation cannot control who can feed, change, or exfiltrate the data behind the model. In that case, the company may own the weights but still lose the advantage embedded in training inputs, evaluation sets, and operational context. The result is capability without durable control.

Q: How can organisations know if their AI data moat is actually protected?

A: They should test whether the most valuable datasets are reachable only by the identities that genuinely need them, and whether those entitlements are reviewed when roles change. If many users, services, or vendors can reach the same data without clear purpose, the moat is already weakened. Governance must be measurable, not assumed.

Technical breakdown

Why open model performance is converging

Open models narrow the gap when common training data, shared research patterns, and fast iteration compress the distance between releases. In practice, this means benchmark parity does not guarantee strategic parity. The model itself may be cheaper and good enough, but its real value depends on whether it can be specialised to a company’s own data and workflows. For IAM and security teams, that means the governance problem moves upstream from inference to the sources that feed model development.

Practical implication: Treat model selection as an access-governance decision, not only a procurement decision.

Data as the AI moat

The article frames data as the durable advantage because enterprise information sits behind firewalls, in applications, and in code systems that generic models do not possess. This makes data access the security boundary for AI differentiation. If training or fine-tuning pipelines can reach broad internal content without tight identity controls, the organisation may create capability, but it also broadens exposure. The critical issue is not whether a model is open or closed, but whether access to the data shaping it is controlled, auditable, and limited to the right actors.

Practical implication: Map every training and fine-tuning input to a governed identity source before model ownership expands.

Owning weights does not mean owning risk

Downloading model weights gives a company more control over deployment and portability, but it does not remove the governance burden around the systems that feed, retrain, and operate the model. The article points to a shift from rented capability to owned capability, yet owned capability still depends on privileged access, data lineage, and operational oversight. For security teams, the architecture question is only half the story; the other half is who can change the model’s inputs and outputs over time.

Practical implication: Apply lifecycle, access review, and privileged control discipline to model pipelines, not just to the model runtime.

Cisco DevHub NHI breach — IntelBroker exploited exposed Cisco credentials, API tokens and keys in DevHub.
DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Open-model adoption turns data governance into identity governance. Once an enterprise uses its own data to specialise a model, the access problem is no longer abstract. The value now sits in systems that hold proprietary data, code, and weights, so identity controls determine whether that value stays bounded or becomes broadly reachable. Security leaders should read this as a governance shift, not a tooling trend.

Model architecture is becoming less differentiating than the permissions around it. The article’s core claim is that open models are closing the quality gap quickly enough that the remaining advantage comes from enterprise data richness. That makes access scope, provenance, and lifecycle more important than model brand. Practitioners should treat AI differentiation as a controlled-access problem.

Enterprise AI creates a new privilege boundary around training data. The named concept here is AI data moat governance: the idea that proprietary data only remains a moat if access to it is tightly governed across collection, training, and operational use. If internal datasets are exposed to too many people, services, or pipelines, the moat erodes before the model ever ships. The implication is that data access governance now defines competitive differentiation.

Owning model weights does not collapse the need for lifecycle discipline. Downloadable weights may change deployment economics, but they do not eliminate the need to govern who can retrain, fine-tune, or move the model into production. That responsibility spans human admins, service identities, and automated pipelines. The practical conclusion is that AI programmes should inherit identity lifecycle controls from the rest of the enterprise, not sit outside them.

The industry is shifting from model scarcity to data scarcity. As frontier and open models converge on general capability, the scarce asset becomes high-quality enterprise data behind the right controls. That changes how boards should think about AI investment, because the security model must protect the pipeline that creates advantage. Practitioners should align AI strategy with identity governance maturity, not with model enthusiasm.

From our research:
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
Our research also found that organisations maintain an average of 6 distinct secrets manager instances, creating fragmentation that undermines centralised control.
For a broader governance lens, see Ultimate Guide to NHIs , Key Research and Survey Results for the research base behind NHI sprawl and control gaps.

What this signals

AI data moat governance: the next phase of AI security is about proving that the data which creates model advantage is still reachable only by the identities that need it. If engineers, service accounts, and vendors all touch the same training inputs, the moat becomes an exposure surface instead of a differentiator.

The governance pattern is already familiar to identity teams. When access reviews are slow and identity ownership is unclear, sensitive material stays reachable long after its business purpose changes, which is why lifecycle discipline needs to extend into AI pipelines. The broader lesson aligns with the NIST Cybersecurity Framework 2.0: identify and protect the data boundary before you assume AI value is controlled.

Teams that are building open-model programmes should expect pressure to prove data provenance, entitlement scope, and removal of stale access across development identities. The operational question is no longer whether the model can be owned, but whether the identities around it can be governed with the same rigor as production systems.

For practitioners

Inventory every AI training data path Map each source that feeds model training or fine-tuning, including code repositories, document stores, and application data. Assign an accountable owner, classify the data, and verify which identities can read, export, or transform it before model work begins.
Restrict privileged access to model weights and pipelines Limit who can download weights, modify training jobs, or change evaluation datasets. Use separate identities for experimentation, production deployment, and retraining so one credential cannot reshape the model lifecycle end to end.
Apply lifecycle governance to AI development roles Review joiner, mover, and leaver controls for engineers, data scientists, and platform identities that touch AI systems. Remove stale access to datasets and model artifacts when people change roles or leave projects.
Treat enterprise data as the differentiator to defend Prioritise access reviews for the data sources that make your model unique, especially internal code, customer context, and proprietary knowledge bases. If those identities are over-broad, the moat is already leaking.

Key takeaways

Open models are closing the capability gap, which shifts the real security question to who can reach the data that shapes them.
The strongest AI advantage now depends on governed access to enterprise data, not on architecture alone.
Identity lifecycle controls must extend into AI pipelines if organisations want model ownership without uncontrolled exposure.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-05	AI data and pipeline access depend on non-human identity governance.
NIST CSF 2.0	PR.AC-4	Access management controls apply to the identities that touch model data and weights.
NIST Zero Trust (SP 800-207)	AC-4	Zero trust limits who can reach data and model assets across distributed AI workflows.

Restrict AI pipeline identities to the smallest set of data sources needed for training and deployment.

Key terms

AI Data Moat Governance: The discipline of controlling who can reach the data that makes an AI system distinctive. It combines data classification, identity governance, and access review so proprietary training inputs, evaluation sets, and operational context do not become broadly exposed inside the organisation.
Model Weight Ownership: The ability to download, store, and deploy a model’s parameters under direct organisational control. Ownership helps with portability and deployment choice, but it does not remove the need to govern the identities that can retrain, fine-tune, or alter the model lifecycle.
Training Data Path: The end-to-end route that data follows from source systems into model preparation, fine-tuning, and evaluation. For AI programmes, this path is an identity boundary because every system and person that can touch it can influence model behaviour or leak sensitive material.

Deepen your knowledge

AI data moat governance is covered in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are extending identity controls into model training and fine-tuning pipelines, this is a useful place to start.

This post draws on content published by WorkOS: Rob Ferguson on Fireworks AI at HumanX 2026. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-04-15.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org