Data quality is the real bottleneck in enterprise AI adoption

By NHI Mgmt Group Editorial TeamPublished 2025-08-08Domain: Governance & RiskSource: Collibra

TL;DR: Poor data quality is already undermining auditability, shared-data trust and AI outputs, with 62% of professionals naming quality their top priority and only 43% assigning stewardship roles, according to Collibra. The governance lesson is that AI performance is now constrained less by model choice than by the reliability and accountability of the underlying data pipeline.

At a glance

What this is: This is a data governance analysis arguing that poor data quality, not AI capability, is the main bottleneck to trustworthy AI decisions.

Why it matters: It matters to IAM practitioners because governance failures in data quality, stewardship and control enforcement mirror the same accountability gaps that weaken identity, access and lifecycle programmes.

By the numbers:

62% said improving data quality is their top priority.
Only 43% have assigned stewardship roles to maintain data quality across domains.
29% admit they can’t effectively enforce data quality standards at all.

👉 Read Collibra's analysis of why data quality is the real AI bottleneck

Context

AI programmes do not fail only because models are weak. They fail when the data they consume is inconsistent, outdated or untrusted, and that makes data quality a governance issue as much as a technical one. For identity, access and lifecycle teams, the parallel is clear: bad source records produce bad decisions downstream, whether the subject is a user, a service account or an AI-driven workflow.

The article argues that organisations are trying to accelerate AI before they have confidence in the records, metadata and stewardship processes underneath it. That is typical of broader governance maturity problems, where visibility exists in pockets but accountability is fragmented across domains.

Key questions

Q: How should organisations govern AI use cases when source data is inconsistent?

A: Start by treating source data quality as a release gate, not a downstream cleanup task. High-impact AI use cases should only proceed when completeness, freshness, structure and ownership meet defined thresholds. If those basics are unstable, model outputs may be persuasive but still unreliable, and the operational risk simply moves from the data team to everyone else.

Q: Why do weak data stewardship processes create broader governance risk?

A: Weak stewardship means no one owns exceptions, remediation or standards enforcement across domains. That creates the same risk pattern seen in identity governance when access ownership is unclear: issues linger, exceptions multiply and accountability becomes informal. The result is not just poor data quality, but reduced trust in decisions, audits and shared services.

Q: How do you know if data quality controls are actually working?

A: Look for fewer manual remediation cycles, faster detection of inconsistencies, and higher confidence in shared datasets across teams. Effective controls should reduce debate about whether data can be used and increase the speed at which defects are corrected. If people still hesitate to act on the data, the control environment is not yet working.

Q: What is the difference between data cleansing and data governance?

A: Data cleansing fixes individual records, while governance defines who owns quality, what standards apply and how exceptions are handled. Cleansing is an activity. Governance is the operating model that keeps quality from collapsing again. Organisations need both, but without governance, cleansing becomes a repeating cost instead of a durable control.

Technical breakdown

Why poor data quality breaks AI trust chains

AI systems do not create reliability from nothing. They inherit quality from the data pipeline, including source completeness, schema consistency, timestamp accuracy and metadata integrity. When those elements drift, model outputs can still look confident while being wrong. That is why data quality is not just a cleansing exercise. It is a control surface that determines whether decisions can be trusted, audited and repeated. In practice, the failure starts long before the model call and often remains invisible until an analyst, customer or auditor challenges the result.

Practical implication: treat data quality checks as a prerequisite control for any AI use case that depends on governed source records.

Stewardship, accountability and governance enforcement

The article’s strongest governance point is that quality degrades when no one owns it across domains. Stewardship roles create a named decision-maker for fixes, exceptions and escalation, while standards enforcement makes those decisions operational instead of advisory. Without that structure, manual workflows spread uncertainty and teams normalize low-confidence data. For IAM and identity governance leaders, this maps directly to recertification and ownership gaps: controls only work when accountability is explicit, current and enforceable.

Practical implication: assign named stewards for critical datasets and make enforcement part of routine governance, not an exception path.

Visibility into pipelines is the control that changes outcomes

The organisations that outperform on trust do not wait for errors to surface. They build visibility into pipeline health, validate structure, monitor consistency and assess timeliness as part of normal governance. That matters because quality problems are often cumulative rather than singular. A small upstream inconsistency can cascade into failed audits, misleading analytics and poor AI predictions. The same pattern appears in identity programmes when entitlement drift or stale records are not visible early enough to correct.

Practical implication: monitor data health continuously across pipelines and workflows so remediation happens before low-quality records cascade into decisions.

NHI Mgmt Group analysis

Data quality is becoming the trust layer for AI governance. The article shows that model accuracy is now constrained by the quality of upstream records, not just by algorithm choice. That is a governance shift, because decision systems inherit the reliability of the data they consume. Practitioners should treat confidence in source data as a prerequisite for AI adoption, not a post-deployment clean-up task.

Stewardship gaps create the same accountability failure that identity programmes see in orphaned access. Only 43% of organisations assigning stewardship roles suggests that many environments still rely on informal ownership. That is the same pattern that produces stale entitlements and unresolved exceptions in identity governance. The field lesson is simple: if no one owns the record, no one owns the risk.

Data Confidence is really control confidence. The article’s point about 29% being unable to enforce standards shows that the issue is not awareness but execution. Quality programmes fail when controls are advisory, fragmented or optional. Practitioners should read this as a warning that AI programmes built on weak enforcement will inherit the same inconsistency at scale.

Named concept: trust debt in data pipelines. Data debt accumulates when organisations tolerate incomplete metadata, inconsistent structures and manual remediation as normal operating conditions. That debt compounds until teams stop trusting shared data, which means AI, audit and operational decisions all slow down together. Practitioners should recognise that trust debt is a governance exposure, not a data-cleaning backlog.

Governance customisation is effective only when it preserves shared control standards. The article notes that tailored frameworks improve adoption, which is true, but customisation cannot replace enforceable core controls. The discipline challenge is to avoid local variation becoming local exception. Practitioners should standardise the control baseline while allowing department-specific implementation detail.

From our research:
The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
43% of security professionals are concerned about AI systems learning and reproducing sensitive information patterns from codebases, according to the same report.
For broader identity governance context, see Ultimate Guide to NHIs , Key Research and Survey Results for how governance gaps scale across machine identities.

What this signals

Trust debt in data pipelines: when organisations tolerate inconsistent records, they also accumulate downstream decision debt. That debt shows up as slower AI adoption, more audit friction and lower confidence in shared services, which is why data governance now behaves like an enabling control rather than a housekeeping function.

The governance pattern is familiar to identity teams. Where ownership is unclear, exceptions persist and standards lose force. The same discipline that improves entitlement reviews and lifecycle accountability also improves data confidence, especially when operational teams need to prove that records are current, complete and enforceable.

For teams aligning data governance with broader control frameworks, the right question is not whether AI can use the data, but whether the organisation can trust the record enough to let AI act on it. That is a practical governance threshold, not an abstract maturity goal.

For practitioners

Define data stewards for critical domains Assign named stewardship roles to the datasets that directly feed AI, analytics and regulatory reporting. Give each steward authority to approve fixes, resolve exceptions and escalate unresolved quality defects.
Embed quality checks in pipeline operations Automate structure validation, consistency checks and timeliness monitoring inside the data pipeline rather than relying on periodic review. The goal is to detect drift before it reaches business users or model training.
Create enforceable quality standards Document minimum quality thresholds for completeness, metadata and freshness, then make those thresholds part of operational review and exception handling. Standards that cannot be enforced become guidance, not governance.
Tie AI use cases to data confidence gates Block or delay high-impact AI deployments until the source data they depend on meets defined confidence criteria. That keeps model rollout aligned with the quality of the records underneath it.

Key takeaways

AI bottlenecks increasingly arise from untrusted data, not from a lack of model capability.
Stewardship and enforceable standards are the difference between isolated fixes and durable data confidence.
Identity, audit and AI programmes all suffer when governance cannot prove that source records are current and reliable.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

NIST CSF 2.0, NIST Zero Trust (SP 800-207) and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST CSF 2.0	GV.OV-01	Data quality governance depends on defined oversight and accountability.
NIST Zero Trust (SP 800-207)	PR.DS	Trusted data is a prerequisite for secure, policy-driven decisions in zero trust environments.
NIST CSF 2.0	PR.DS-01	The article centers on integrity and reliability of information used in decisions.

Validate data integrity controls and treat inconsistent records as a governance defect, not a cosmetic issue.

Key terms

Data Stewardship: Data stewardship is the assigned accountability for maintaining the quality, meaning and usability of a dataset. In practice, it defines who resolves exceptions, approves fixes and enforces standards so quality does not depend on informal ownership or ad hoc intervention.
Data Confidence: Data confidence is the degree to which an organisation can trust its data to support decisions, automation and reporting. It depends on visibility, quality controls, stewardship and enforcement, not just on whether data exists or can be accessed.
Pipeline Health: Pipeline health describes the operational condition of data flows as records move from source systems to consumers. It includes structure, consistency, timeliness and error handling, all of which determine whether downstream analytics and AI outputs remain reliable.
Trust Debt: Trust debt is the accumulated loss of confidence created when poor-quality data, weak ownership and manual remediation become normal operating conditions. It is a governance problem because it increases hesitation, audit friction and decision latency across the organisation.

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are building or maturing an identity security programme, it is worth exploring.

This post draws on content published by Collibra: Your data is lying to you, why quality is the real AI bottleneck. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2025-08-08.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org