What Is AI Threat Modeling? Definition & Examples

Expanded Definition

AI threat modeling is a structured way to enumerate how an AI system can be attacked, misused, or induced to disclose sensitive information across its full execution path. That means examining prompts, model behaviour, training data, retrieval layers, plugins, tool calls, orchestration logic, and output handling, not just the model checkpoint itself.

In NHI security, the term is broader than classic application threat modeling because the attack surface often includes non-human identities, secrets, and delegated access that an agent can use on behalf of a user or workflow. Guidance is still evolving across vendors, but frameworks such as the MITRE ATLAS adversarial AI threat matrix help teams translate AI-specific abuse paths into concrete controls. For agentic systems, the CSA MAESTRO agentic AI threat modeling framework is also useful for thinking about planning, delegation, and tool execution risks.

The most common misapplication is treating AI threat modeling as a one-time review of the model API, which occurs when teams ignore prompts, retrieval sources, and connected credentials.

Examples and Use Cases

Implementing AI threat modeling rigorously often introduces slower release cycles and more review overhead, requiring organisations to weigh shipping velocity against the cost of exposing sensitive data or enabling unsafe tool use.

A customer support assistant is tested for prompt injection that could coerce it into revealing internal tickets or policy data.

An agent connected to cloud APIs is mapped for privilege escalation paths through overbroad non-human identities and reused secrets, a pattern highlighted in LLMjacking: How Attackers Hijack AI Using Compromised NHIs.

A RAG system is assessed for data poisoning, where malicious or low-trust documents can shape outputs or leak sensitive context.

A coding assistant is reviewed for memorisation risk, especially when it may reproduce secrets or internal patterns from repositories, which aligns with concerns described in The State of Secrets in AppSec.

A multimodal agent is evaluated for unsafe tool chaining, such as sending data to external services without approval or guardrails.

Practitioners often pair this work with the OWASP NHI Top 10 and the Anthropic report on AI-orchestrated cyber espionage to test realistic attacker workflows rather than abstract AI failure modes.

Why It Matters in NHI Security

AI threat modeling matters because AI systems often inherit access, secrets, and permissions from the surrounding environment, which means a model flaw can become an identity and access failure. When the model can call tools, read retrieval content, or operate under a service principal, the risk is no longer limited to bad outputs. It becomes a governance issue spanning authentication, authorisation, secrets hygiene, and blast-radius control.

NHIMG research shows why speed matters: in the LLMjacking research, attackers attempted access to exposed AWS credentials in an average of 17 minutes, and as quickly as 9 minutes in some cases. That is the kind of tempo threat models must assume when AI workflows depend on live credentials or shared tokens. For broader NHI context, the The 52 NHI breaches Report and CISA cyber threat advisories reinforce that identity abuse is a common entry point, not an edge case.

The practical value is in exposing how a harmless-looking prompt or plugin can become a path to data loss, privilege abuse, or unauthorised action. Organisations typically encounter the impact only after an agent leaks data, calls the wrong tool, or uses exposed credentials, at which point AI threat modeling becomes operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A1	Agentic AI risks are surfaced through prompt, tool, and orchestration abuse paths.
CSA MAESTRO		MAESTRO is a threat modeling framework for agentic AI systems and their control flows.
NIST AI RMF	GOVERN	AI risk management requires identifying, assessing, and governing model harms and misuse.

Model prompts, tools, and delegated actions as attack surfaces and test for injection, misuse, and unsafe execution.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

AI Threat Modeling

Expanded Definition

Examples and Use Cases

Why It Matters in NHI Security

Standards & Framework Alignment

Related resources from NHI Mgmt Group