Notifications

Clear all

Unstructured data for GenAI and agents: what IAM teams need to know

Last Post

RSS

NHI Mgmt Group

(@nhi-mgmt-group)

Member Moderator

Joined: 1 year ago

Posts: 12212

Topic starter 24/06/2026 8:54 pm

TL;DR: Most enterprise data remains unstructured, and without enrichment and governance it is hard for GenAI and agentic systems to retrieve, reason with, or trust the content, according to Collibra. The practical issue is not data volume alone but governed context: AI outputs become weaker when metadata, discoverability, and routing are missing.

NHIMG editorial — based on content published by Collibra: Making unstructured data AI ready: Unlocking value for GenAI and agents

Questions worth separating out

Q: How should security teams govern unstructured data used by AI systems?

A: Security teams should classify the source content, define metadata standards, and restrict AI pipelines to governed repositories with known ownership and review rules.

Q: Why do unstructured files create risk for GenAI and agentic workflows?

A: Unstructured files create risk because retrieval systems cannot reliably infer business meaning, ownership, or freshness from raw content alone.

Q: How can organisations tell whether AI input governance is actually working?

A: Look for lower duplication, better content discoverability, clearer source attribution, and fewer AI outputs that depend on manually corrected context.

Practitioner guidance

Map AI use cases to content classes Identify which repositories feed GenAI, RAG, and agentic workflows, then separate high-value governed sources from noisy or duplicated content.
Define metadata standards for machine consumption Require business-specific tags, ownership, and lifecycle fields before content enters AI pipelines.
Introduce review logic for stale or duplicated sources Build periodic validation for content freshness, duplication, and source authority so AI systems do not keep retrieving outdated material.

What's in the full article

Collibra's full blog post covers the operational detail this post intentionally leaves for the source:

Step-by-step description of Smart Discovery and how repositories are scanned at scale.
Implementation detail on how semantic tagging is generated and maintained across content types.
Use-case examples for AI input governance, RAG optimisation, and enterprise search.
Product-oriented explanation of how governed knowledge assets are operationalised across workflows.

👉 Read Collibra's analysis of unstructured data readiness for GenAI and agents →

Unstructured data for GenAI and agents: what IAM teams need to know?

Explore further

View Full Forum → | NHI Foundation Course →

Quote

Topic Tags

Mr NHI

(@mr-nhi)

Member Moderator

Joined: 2 months ago

Posts: 11787

25/06/2026 5:46 am

AI readiness now depends on governed context, not just data volume. Enterprise AI projects fail less because they lack information and more because they cannot reliably select the right information at runtime. When unstructured content is duplicated, stale, or untagged, the model receives weak context and produces weak outcomes. The implication is that AI governance must start with the quality of inputs, not the sophistication of the model.

A few things that frame the scale:

98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.

A question worth separating out:

Q: What should teams prioritise first for AI-ready content?

A: Start with the repositories that feed production AI use cases, then apply enrichment, ownership, and lifecycle controls to those sources before expanding outward. That sequence reduces risk faster than trying to govern every document store at once.

👉 Read our full editorial: Unstructured data governance is the bottleneck for GenAI and agents

ReplyQuote

Forum Statistics

11 Forums

13.5 K Topics

25.8 K Posts

124 Online

135 Members

Latest Post: Silk Typhoon arrest and exposed credentials: what do teams need to watch? Our newest member: Alex Recent Posts Unread Posts Tags

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Get in Touch

Quick Links

FAQ

NHI 101 Articles

Legal & Policies