Why do agent harnesses create a larger attack surface than the model itself?

Why This Matters for Security Teams

Agent harnesses expand the attack surface because they sit between the model and the real environment: they hold the tokens, session state, file access, network paths, and execution logic that turn predictions into actions. A model compromise is serious, but a harness compromise is worse because it can inherit broad operational authority. That is why current guidance on agentic applications treats the wrapper, not just the model, as part of the security boundary in the OWASP Agentic AI Top 10 and the NIST AI Risk Management Framework.

This is also consistent with NHIMG research on real-world NHI exposure. In LLMjacking: How Attackers Hijack AI Using Compromised NHIs, Entro Security reports that when AWS credentials are exposed publicly, attackers attempt access within an average of 17 minutes. That speed matters because an agent harness often concentrates the exact assets attackers want: reusable secrets, tool permissions, and durable state. In practice, many security teams discover harness abuse only after the agent has already used legitimate pathways to move data, call tools, or expose credentials, rather than through intentional testing.

How It Works in Practice

The model produces output, but the harness decides what is allowed, what is executed, and what gets recorded. That makes the harness an enforcement point, a broker of trust, and a concentration of privilege. If the harness is weak, prompt injection or tool abuse can redirect benign model output into harmful actions. If the harness stores long-lived credentials, the impact extends beyond a single task and can survive model resets or prompt cleanup.

Security teams should treat the harness as a workload boundary, not a convenience layer. Current practice is shifting toward runtime controls that evaluate intent, context, and blast radius before the agent touches tools or data. That includes:

Short-lived credentials issued just in time for a specific task

Workload identity for the harness and agent runtime, such as cryptographic identity rather than shared secrets

Policy-as-code evaluated at request time, instead of static allowlists that assume predictable behaviour

Separation of the model, orchestration layer, and secret store so a single compromise does not expose everything

NHIMG research on agent exposure reinforces this shift. The AI Agents: The New Attack Surface report notes that 80% of organisations report agents performing actions beyond intended scope, including accessing unauthorised systems and revealing credentials. That aligns with MITRE ATLAS adversarial AI threat matrix and the CSA MAESTRO agentic AI threat modeling framework, both of which emphasize adversarial manipulation of the system around the model. These controls tend to break down when the harness shares credentials across tenants or jobs because one compromise can immediately inherit persistent access.

Common Variations and Edge Cases

Tighter harness controls often increase operational overhead, requiring organisations to balance agent autonomy against governance, latency, and supportability. That tradeoff becomes especially visible when the harness must chain multiple tools, call external APIs, or hand off work across different identities.

There is no universal standard for this yet, but current guidance suggests a few patterns are safer than static privilege. A retrieval agent that only reads documents should not carry the same secrets as an execution agent that can write tickets, trigger deployments, or open cloud consoles. Similarly, a multi-agent workflow may need per-agent credentials and separate audit trails because shared state can blur accountability after an incident.

Edge cases matter. Some environments, such as CI/CD pipelines or SOC automation, already rely on delegated machine access, so the harness may look like traditional automation at first glance. The difference is that agent behaviour is less deterministic, which means a static RBAC model can approve a task the agent later expands in unexpected ways. That is why the best practice is evolving toward context-aware authorization and ephemeral access rather than broad standing privileges. NHIMG’s broader NHI guidance in the Ultimate Guide to NHIs — Key Challenges and Risks and the 52 NHI Breaches Analysis shows the same pattern: the most damaging failures usually come from over-privileged non-human workloads, not from the model alone.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A2	Agent harness risk maps to runtime abuse of tools and permissions.
CSA MAESTRO	T2	MAESTRO models the harness as the control plane for agent behaviour.
NIST AI RMF	GOVERN	AI RMF governance covers accountability for agent-mediated actions.

Threat model the orchestration layer and isolate secrets from model execution paths.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Why do agent harnesses create a larger attack surface than the model itself?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group