Why do LLMs create more risk than ordinary application workloads?

Why Traditional Workload Controls Miss LLM Risk

Ordinary application workloads usually follow predictable request and response patterns. LLMs are different because they can turn unstructured language into a decision path, then call tools, query data, or trigger downstream systems. That makes the attack surface semantic as well as technical. Guidance from the OWASP Agentic AI Top 10 and the NIST AI Risk Management Framework both point to the same issue: the model can be induced to act in ways that were never encoded as a normal application path. In practice, 80% of organisations say their AI agents have already acted beyond intended scope, according to AI Agents: The New Attack Surface by SailPoint. That is why static perimeter thinking fails once an LLM is allowed to interpret, retrieve, and execute.

Security teams often assume the danger starts and ends with prompt injection, but the real risk is the chain from language to authority. In practice, many security teams encounter misuse only after the model has already touched a data store or called a business tool, rather than through intentional review.

How LLMs Become an Execution Layer

An LLM becomes materially riskier when it is connected to secrets, APIs, or a workflow engine. At that point the question is no longer whether the model is “smart,” but whether it has the right identity, the right scope, and the right runtime policy for each action. Best practice is evolving toward intent-based authorisation: evaluate what the agent is trying to do at request time, then issue only the minimum authority needed for that task. That is very different from RBAC alone, which assumes stable job functions and predictable access patterns.

For agentic workloads, identity should be workload identity first, not a shared service account. Standards such as the SPIFFE workload identity specification support cryptographic proof of what the workload is, while policy engines can decide whether to allow a tool call, data retrieval, or side effect. Current guidance also supports JIT credential provisioning, where short-lived secrets are issued per task and revoked when the task ends. That reduces the blast radius if a prompt is manipulated or an agent chains tools unexpectedly.

This is why NHIMG research on the OWASP NHI Top 10 is so relevant, and why implementation work often starts with identity boundaries rather than model tuning. Where teams need a broader threat-modeling lens, the CSA MAESTRO agentic AI threat modeling framework and the NIST AI 600-1 Generative AI Profile both reinforce the need for governance, traceability, and runtime controls. These controls tend to break down when agents are allowed to retain long-lived credentials because the model can be repurposed faster than the access can be reviewed.

Use short-lived, task-scoped secrets instead of standing access.

Bind each agent to a verifiable workload identity.

Evaluate tool use with policy-as-code at runtime, not only at onboarding.

Log every retrieval, action, and credential issuance for auditability.

Where the Risk Profile Changes in Real Deployments

Tighter controls often increase operational overhead, requiring organisations to balance speed of delivery against containment. That tradeoff becomes most visible in multi-tool agents, background automation, and customer-facing assistants that can browse, write, or transact. In those environments, the model is not just answering a query; it is often selecting tools, chaining steps, and carrying context forward. There is no universal standard for this yet, but current guidance suggests separating read, propose, and execute permissions so that an LLM cannot silently cross from advice into action.

Nuance matters. A retrieval-only chatbot has a smaller risk profile than an agent that can email users, modify tickets, or access production data. Likewise, a system with strong prompt hygiene but shared credentials is still exposed because the credential, not the prompt, becomes the durable attack path. NHIMG coverage such as the Moltbook AI agent keys breach and the AI LLM hijack breach shows how quickly exposed keys and over-broad agent permissions can turn a language interface into a control plane. The practical lesson is simple: treat LLMs as autonomous, goal-driven workloads, not ordinary apps with a chat front end.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A01	Agentic apps fail when prompts can drive tool use beyond intended scope.
CSA MAESTRO		MAESTRO models runtime, identity, and orchestration risks for agents.
NIST AI RMF		AIRMF provides governance for unpredictable AI behaviour and accountability.

Assign owners, monitor outcomes, and document controls across the AI lifecycle.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Why do LLMs create more risk than ordinary application workloads?

Why Traditional Workload Controls Miss LLM Risk

How LLMs Become an Execution Layer

Where the Risk Profile Changes in Real Deployments

Standards & Framework Alignment

Related resources from NHI Mgmt Group