TL;DR: Web teams are shifting from blocking all bots to selectively welcoming LLMs and AI agents, while still defending against scraping, credential abuse, and fake signups, according to WorkOS. That reversal turns content discovery, authentication, and abuse detection into an identity governance problem, not just a web UX problem.
At a glance
What this is: The article argues that the web is moving from bot exclusion to selective machine openness, with LLMs and AI agents becoming new discovery and interaction layers.
Why it matters: This matters because IAM teams now have to govern which non-human actors can read, reach, and act on digital services without weakening abuse controls for humans or machines.
👉 Read WorkOS's analysis of how websites are adapting for LLMs and AI agents
Context
The core issue is not whether machines should be allowed in. It is how organisations distinguish useful non-human access from abuse when the same surface can serve humans, crawlers, LLMs, and agents at once. That creates an identity and access problem around machine-readable content, machine-facing endpoints, and the trust signals used to separate legitimate automation from hostile automation.
For IAM and NHI programmes, this is a governance shift as much as a technical one. Content discoverability, authentication flows, and abuse detection are now linked, because a site that is legible to AI systems is also more exposed to automated misuse if policy, telemetry, and entitlement boundaries are weak.
Key questions
Q: How should security teams govern LLM access to public content and APIs?
A: Treat LLM access as a separate machine identity problem, not a simple extension of web publishing. Allow discovery on public content where appropriate, but keep transactional APIs, sensitive docs, and auth flows behind explicit controls, telemetry, and scope reviews. The key is to separate readable content from actionable access so machine consumption does not become unintended privilege.
Q: Why do AI-friendly websites still need bot and fraud controls?
A: Because helpful automation and hostile automation use similar patterns at the edge. A site that is easy for LLMs to read can also be easy for scrapers, fake sign-ups, and credential attackers to probe. Security teams still need rate limits, behavioural detection, and login abuse monitoring so invitation does not become exposure.
Q: What do security teams get wrong about robots.txt and allowlists?
A: They often treat allowlisting as proof of trust. In reality, robots.txt and similar signals are only part of a broader access posture, and they do not authenticate intent. Teams should assume that any public machine-readable route can be abused unless it is backed by policy, logging, and abuse response.
Q: How can organisations balance AI discovery with least privilege?
A: By limiting machine-readable exposure to the minimum content needed for discovery, while keeping operational systems, secrets, and sensitive workflows out of reach. Least privilege for machines means the model can understand what is public, but cannot infer or execute what is not intended for automated use.
Technical breakdown
Selective openness for LLMs and crawlers
The article describes a move from blanket bot blocking to allowlisting trusted AI consumers through robots.txt, dedicated paths, and metadata designed for machine parsing. This is not the same as granting application access. It is a discoverability pattern that lets models read and interpret public content while leaving transactional systems behind stronger controls. The technical challenge is that the same infrastructure can be used by benign retrieval agents and by scrapers that mimic them, which makes classification and intent detection part of the access decision.
Practical implication: separate machine-readable discovery surfaces from sensitive runtime APIs, and treat them with distinct policy and telemetry.
Machine legibility in documentation and APIs
The piece highlights structured markup, OpenAPI, GraphQL introspection, and consistent documentation patterns as ways to make products easier for models to understand. In practice, this means organisations are optimising for semantic clarity, not only human readability. When product information is fragmented or ambiguous, AI systems infer incorrectly, which can affect trust, referrals, and how users are routed. But the same machine-legible structure can also expose more of the service model than intended if scope and disclosure are not reviewed carefully.
Practical implication: review documentation and API metadata as an exposure surface, not just a developer convenience layer.
Abuse detection still has to sit beside invitation
The article makes clear that welcoming AI systems does not remove the need to block credential stuffing, fake signups, and scraping. Instead, the control plane becomes dual purpose: one part invites selected automation, while another part continuously scores and blocks hostile traffic. That means edge signals, login telemetry, and behavioural fingerprints become more important, not less. A programme that optimises for machine friendliness without equal investment in abuse detection will widen its attack surface.
Practical implication: keep fraud, bot, and login-abuse detection in the same operational conversation as LLM enablement.
Threat narrative
Attacker objective: The attacker wants to exploit selective openness to gain broad automated access without triggering the controls meant to separate helpful agents from hostile bots.
- Entry: the attacker reaches public content, login, or API surfaces that were made machine-readable for LLMs and crawlers.
- Escalation: the attacker blends in with trusted automation, reusing allowlisted patterns, structured endpoints, or weakly differentiated bot signals.
- Impact: the attacker increases scraping, credential abuse, or fake sign-up activity while the organisation believes it is only exposing content to helpful agents.
Breaches seen in the wild
- Moltbook AI agent keys breach — Moltbook breach exposed 1.5M AI agent keys.
- AI LLM hijack breach — attackers used stolen AWS access keys to hijack Anthropic LLM models on Bedrock.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
LLM readiness is now an identity governance problem, not a web-only concern. Once content, docs, and APIs are tuned for machine consumption, the question becomes which non-human actors are authorised to interpret, traverse, or act on them. That is a governance boundary, not a UX tweak. Practitioners should treat machine legibility as part of the access model.
Selective openness creates a new trust boundary between helpful agents and hostile automation. The article describes an environment where the same organisation may allow GPTBot while blocking scrapers, which means classification and verification become central control points. This is where identity policy meets bot policy, and the failure mode is not total openness or total denial but weak differentiation between authorised and unauthorised machine actors. Practitioners should design for differentiated machine trust.
Machine-readable metadata can become exposure if entitlement review does not extend to content surfaces. Structured markup, self-describing APIs, and AI-oriented summaries help models understand services, but they also widen what is inferable about products, endpoints, and workflows. Machine-readable exposure drift: the more a service is tuned for AI parsing, the more governance has to decide what information should remain discoverable. Practitioners should review these surfaces as part of information access governance.
Abuse controls and AI enablement now have to be managed together. The article correctly avoids the false choice between blocking all bots and inviting all of them. That is the right framing for the market: identity teams need a policy model for machine access plus real-time abuse detection, because allowing one without the other simply moves risk around. Practitioners should align bot policy with NHI and fraud controls.
What looks like content distribution is becoming a control problem for the full identity stack. When a user says they found a product through ChatGPT, the influence path has already shifted from human search to machine mediation. That means human IAM, NHI governance, and emerging agentic access patterns are converging around the same discovery and trust layer. Practitioners should plan for that convergence now.
From our research:
- Only 5.7% of organisations have full visibility into their service accounts, according to the Ultimate Guide to NHIs.
- 96% of organisations store secrets outside of secrets managers in vulnerable locations including code, config files, and CI/CD tools.
- The governance answer is not more machine exposure by default. It is tighter visibility into the Ultimate Guide to NHIs and better control over what non-human actors can actually reach.
What this signals
Machine-readable content will force NHI programmes to expand beyond secrets and service accounts. The real change is that discoverability itself is becoming a governed surface, with public docs, metadata, and AI-specific paths needing the same kind of boundary thinking that teams already apply to workloads and service identities.
The market is moving toward dual-control architectures for machine access. One control plane invites selected automation, while another detects abuse, and organisations that do not build both will either block too much or expose too much. That is why selective openness needs NHI governance, fraud telemetry, and policy review in the same operating model.
Only 5.7% of organisations have full visibility into their service accounts, and that visibility gap will now matter at the content layer too. As more services become legible to LLMs and agents, the absence of identity visibility stops being a back-office problem and becomes a front-door risk. Teams should expect machine discovery, API exposure, and access policy to converge in the next planning cycle.
For practitioners
- Define separate policy for machine-readable surfaces Classify public docs, AI-specific paths, and transactional APIs separately so allowlisting for crawlers does not imply access to operational systems.
- Review structured metadata as an exposure surface Audit schema.org, OpenAPI, GraphQL introspection, and LLM summary fields for data that helps discovery but reveals more than intended.
- Pair allowlists with abuse telemetry Use login, signup, and request-behaviour signals to distinguish trusted machines from hostile automation, especially at the edge and auth layer.
- Align AI discovery with NHI governance Bring machine discovery, service identity, and fraud monitoring into one review cycle so the team can see how non-human access is changing the control surface.
Key takeaways
- The article shows that the web is no longer built only to exclude bots, because LLMs and AI agents are becoming part of normal discovery and interaction flows.
- The security risk is selective openness without equal abuse detection, which can make machine-friendly surfaces easier for hostile automation to exploit.
- Practitioners should govern machine-readable content, APIs, and bot policy as one identity problem, not as separate web, fraud, and IAM issues.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Non-Human Identity Top 10 | NHI-01 | Machine-readable access and service identity visibility are central to this article. |
| NIST CSF 2.0 | PR.AC-4 | Selective openness depends on enforcing least privilege for machine access. |
| NIST Zero Trust (SP 800-207) | AC-4 | The article's trust boundary logic maps to Zero Trust access control and policy enforcement. |
Limit machine-readable exposure to the minimum necessary and review entitlement scope regularly.
Key terms
- Machine-readable exposure: Machine-readable exposure is the amount of a service, document, or API that can be parsed by automated systems without human mediation. In identity terms, it is a discoverability surface that must be governed separately from interactive user access because it can reveal intent, structure, and operational detail.
- Selective openness: Selective openness is the practice of allowing trusted automation to access specific public or semi-public surfaces while continuing to block hostile bots. It requires policy, telemetry, and classification, because the control objective is not blanket access or blanket denial, but differentiated machine trust.
- Abuse telemetry: Abuse telemetry is the behavioural and request data used to distinguish legitimate automation from hostile activity. For identity teams, it includes login patterns, request velocity, credential reuse, and edge signals that support real-time blocking and post-event investigation across non-human access paths.
Deepen your knowledge
LLM-ready web design and machine access governance are covered in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are redesigning content or APIs for AI discovery, it is a useful fit for your programme.
This post draws on content published by WorkOS: From blocking bots to optimizing for LLMs: How the web flipped its script. Read the original.
Published by the NHIMG editorial team on 2025-07-10.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org