Why do LLMs create more risk than ordinary automation workflows?

Why This Matters for Security Teams

LLMs are riskier than ordinary automation because they do not follow a fixed decision path. The same prompt, retrieved document, or tool output can produce a different result at runtime, which makes access, data handling, and downstream actions harder to predict and harder to govern. That is why the question belongs in agent and workload identity discussions, not just application security.

Ordinary workflows usually fail in known ways. LLM-driven workflows can fail in novel ways, including prompt injection, tool misuse, and context poisoning. Guidance from OWASP Agentic AI Top 10 and NIST AI Risk Management Framework both emphasize that runtime behavior must be evaluated as part of the control surface, not assumed safe because the workflow is “automated.” NHIMG research shows how quickly this becomes operationally visible: the LLMjacking report documents attacker focus on exposed AI credentials and the abuse of non-human identities.

In practice, many security teams encounter LLM risk only after an agent has already issued an unexpected tool call, accessed a sensitive source, or exposed data through a downstream integration, rather than through intentional testing of the workflow.

How It Works in Practice

Conventional automation is usually deterministic: if input A arrives, action B follows. LLM workflows are probabilistic and context-sensitive, so the identity question changes from “who signed in?” to “what is this workload trying to do right now, with which context, and should it be allowed to do it?” That is why current guidance increasingly favors runtime policy checks, short-lived secrets, and workload identity over static role assignment.

A practical control stack usually includes three layers. First, the agent needs cryptographic workload identity, such as SPIFFE/SPIRE or OIDC-bound tokens, so the system can prove what the workload is. Second, the agent should receive just-in-time credentials scoped to a single task or a short TTL, then revoke them automatically after use. Third, policy must be evaluated at request time, using context such as tool name, data classification, user approval, and session state. This is the direction reflected in CSA MAESTRO agentic AI threat modeling framework and the OWASP NHI Top 10, which both highlight identity, tool access, and runtime control as core attack surfaces.

Use workload identity for the agent, not a shared service account.

Issue ephemeral secrets per task, not long-lived API keys in code or config.

Evaluate policy at runtime, not only during deployment or onboarding.

Require explicit guardrails before any tool that can move data, spend money, or modify infrastructure.

These controls tend to break down when agents are chained across multiple vendors or when tool access depends on human-in-the-loop approval that is inconsistent, delayed, or bypassed by retries.

Common Variations and Edge Cases

Tighter runtime control often increases latency and operational overhead, so organisations have to balance safer agent behavior against throughput, developer friction, and support complexity. That tradeoff is real, and best practice is still evolving for highly autonomous systems.

One common edge case is read-only LLM usage. Even then, the model can still leak sensitive context, so “no write access” does not mean “low risk.” Another is multi-agent orchestration, where one agent can inherit trust from another and expand the blast radius of a single bad decision. In those environments, it is often better to separate identities and contexts per subtask rather than treat the pipeline as one trusted workflow. This is especially relevant when vendor tools, browser automation, and internal APIs are all connected in the same run. The NIST AI Risk Management Framework and NHIMG’s 2024 ESG Report on Managing Non-Human Identities both support a shift toward stronger governance, but there is no universal standard yet for how granular agent permissions should be in every environment.

Another edge case is model drift after release. If the model, prompts, or retrieval corpus change frequently, the access model must be revisited just as often. Static approvals age badly in systems that learn, adapt, or re-plan at runtime.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A3	Agentic workflows introduce runtime tool misuse and prompt-injection risk.
CSA MAESTRO	MT-2	MAESTRO covers agent identity, orchestration, and tool trust boundaries.
NIST AI RMF		AI RMF addresses governance for dynamic AI behavior and emerging operational risk.

Assign per-agent identities and enforce least privilege across each orchestration step.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Why do LLMs create more risk than ordinary automation workflows?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group