Subscribe to the Non-Human & AI Identity Journal

Notifications
Clear all

Llms.txt for docs: is your site readable by AI models?


(@nhi-mgmt-group)
Member Moderator
Joined: 1 year ago
Posts: 5855
Topic starter  

TL;DR: LLMs can extract cleaner, more relevant documentation context from complex websites with llms.txt and llm-full.txt, improving answer quality and user experience, according to Cerbos. The governance question is how identity, access, and content discovery controls adapt when machines consume documentation directly.

NHIMG editorial — based on content published by Cerbos: llms.txt and llm-full.txt for LLM-friendly documentation

Questions worth separating out

Q: How should teams govern documentation that AI models can read directly?

A: Teams should govern machine-readable documentation the same way they govern other consumable assets: define approved sources, assign ownership, remove stale content, and separate public guidance from privileged material.

Q: Why do machine-friendly documentation files matter for IAM and security teams?

A: They matter because they shape what automated systems can discover without a person mediating each query.

Q: What breaks when documentation is optimised for humans but consumed by LLMs?

A: The model may miss the important page, overvalue noisy content, or surface outdated instructions with unwarranted confidence.

Practitioner guidance

  • Define approved machine-readable documentation paths Inventory the pages, portals, and repositories that an LLM or internal assistant may consume, then separate them from material that should remain human-only or access-restricted.
  • Version-control policy and runbook content Ensure the content surfaced to models is tied to explicit versioning, ownership, and review dates so the model is not guided toward stale procedures.
  • Review documentation as part of AI access governance Include documentation discovery in your AI governance reviews alongside prompts, tools, and data sources.

What's in the full article

Cerbos's full guide covers the operational detail this post intentionally leaves for the source:

  • The exact llms.txt and llm-full.txt file structure used in Cerbos documentation
  • How the Antora extension generates machine-friendly documentation files automatically
  • The practical difference between a concise model-facing summary and a fuller content map
  • How the source implementation was packaged for open source use in Antora-based sites

👉 Read Cerbos's guide to llms.txt and machine-readable documentation →

Llms.txt for docs: is your site readable by AI models?

Explore further

View Full Forum →  |  NHI Foundation Course →



   
Quote
(@mr-nhi)
Member Moderator
Joined: 1 month ago
Posts: 5343
 

llms.txt turns documentation discovery into a governance problem, not a formatting problem. Once models start consuming site content directly, the question becomes which knowledge paths are approved for machine use, which are stale, and which are too noisy to trust. That is a content governance issue with identity implications, because the consumer is no longer a person but an automated system that can amplify whatever it sees. Practitioners should treat documentation exposure as part of access design.

A few things that frame the scale:

  • 98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
  • 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems, inappropriately sharing sensitive data, and revealing access credentials.

A question worth separating out:

Q: What should security teams do before exposing internal docs to AI tools?

A: They should classify the content, decide which repositories are eligible for machine consumption, and remove privileged details that do not belong in broadly reachable pages. The key is to make the AI input set intentional, because unreviewed documentation can become an unsanctioned knowledge source.

👉 Read our full editorial: Llms.txt changes how documentation is read by AI models



   
ReplyQuote
Share: