TL;DR: TrojAI’s interview traces how adversarial ML moved from a niche model-testing problem to a runtime security issue as organisations push AI into production, including agentic AI systems, according to TrojAI. The practical shift is that security teams now need to govern build-time and runtime exposure together, not as separate concerns.
At a glance
What this is: This is a TrojAI interview about how AI security evolved from adversarial model testing into build-time and runtime defense for enterprise AI systems.
Why it matters: It matters because IAM, NHI, and AI security teams are now being asked to govern AI systems that behave more like active runtime actors than static software components.
👉 Read TROJ.AI’s interview on AI security, adversarial ML, and runtime defense
Context
AI security now sits between model integrity, runtime behavior, and enterprise governance. As AI systems move into production, the control problem changes from checking a model once to managing how it behaves under active attack, which is why runtime defense and red teaming have become part of the same conversation.
This interview frames that shift through TrojAI’s own origin story and its view of enterprise deployment pressure. For practitioners, the important question is not whether AI can be tested, but how to govern AI systems that are continuously exposed to new inputs, new threats, and new operational demands.
Key questions
Q: How should security teams govern AI systems that behave unpredictably in production?
A: Security teams should govern production AI as a live control problem, not a one-time validation problem. That means combining pre-deployment testing, runtime monitoring, logging, and scoped authority so the system cannot freely turn a bad input into a broad operational failure.
Q: Why do build-time AI tests fail to fully reduce production risk?
A: Build-time tests only prove how a model behaved in a controlled environment. Production risk remains because real users, adversaries, and workflow integrations create inputs and consequences that test data does not capture, so runtime control is still required.
Q: When should organisations treat AI security as part of identity governance?
A: Organisations should treat AI security as identity governance whenever an AI system can access tools, call APIs, or trigger downstream actions. At that point, the model is not just making predictions. It is exercising authority that needs scope, logging, and review.
Q: What should teams do if an AI system can chain actions across workflows?
A: Teams should restrict the system’s execution scope, require explicit approval for high-impact actions, and monitor for chained behavior that turns one model decision into multiple downstream events. The goal is to prevent a single bad interaction from compounding.
Technical breakdown
Adversarial machine learning changes the security model
Adversarial machine learning is the practice of manipulating inputs or model behavior so an AI system produces the wrong outcome. In this article’s arc, the progression moves from image misclassification to more general AI risk, which is important because the threat is not limited to one model type. A system that behaves correctly in a lab can still fail when inputs are shaped to exploit its decision boundaries. That is why AI security cannot stop at model evaluation alone. It has to account for how the model responds once it is embedded in real workflows and exposed to active adversaries.
Practical implication: security teams should treat model validation as necessary but insufficient, and add runtime controls for active misuse.
Build time red teaming and runtime defense solve different problems
Build time red teaming tests an AI system before deployment, while runtime defense monitors and constrains it after it is live. Those are not interchangeable controls. The first finds weaknesses before release, but the second is needed because production environments introduce prompt abuse, input drift, and operational edge cases that never appear in test data. The interview’s split between TrojAI Detect and TrojAI Defend reflects that divide. For practitioners, the architecture question is whether AI security is being handled as a one-time gate or as a lifecycle control that stays in place after go-live.
Practical implication: design AI governance as a lifecycle, with separate controls for pre-deployment assurance and live-session protection.
Agentic AI raises the blast radius of security failures
Agentic AI systems do more than classify or predict. They can chain actions, interact with tools, and influence downstream systems, which increases the consequence of a bad decision or adversarial input. That changes the security profile from a single model error to a potential sequence of compounding actions. In practical terms, the governance burden grows because the system is no longer just producing an output, it is participating in execution. That is why agentic deployments demand closer scrutiny of tool access, decision scope, and runtime guardrails than traditional model-only systems.
Practical implication: map what tools and downstream actions an AI system can trigger before giving it production reach.
Threat narrative
Attacker objective: The attacker’s objective is to make the AI system behave incorrectly in a way that produces real-world operational impact.
- entry: An attacker influences an AI system through crafted inputs, adversarial examples, or prompt-driven misuse that reaches the model at inference time.
- escalation: The model produces a misclassification, unsafe action, or erroneous recommendation that alters downstream behavior or decision-making.
- impact: The compromised AI output propagates into enterprise workflows, creating operational, security, or trust failures at scale.
Breaches seen in the wild
- McKinsey AI platform breach — McKinsey AI platform hack exposed 46M chats and sensitive data.
- DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
AI security is no longer a model-quality problem alone. The article’s core lesson is that adversarial pressure moves the control point from training-time confidence to runtime governance. That matters because enterprises do not deploy AI in static conditions, and the threat surface grows once models are embedded in live workflows. Practitioners should treat AI security as an operating condition, not a pre-launch test result.
Build time assurance and runtime defense are different governance layers. A system can pass red teaming and still fail when exposed to live inputs, changing context, or chained operational use. That split is especially important for enterprise programs that assume a single validation event is enough. Security teams should interpret AI assurance as continuous control coverage across the full lifecycle.
Agentic AI increases the consequences of security failure because it acts, not just predicts. When an AI system can select tools or trigger downstream actions, a bad decision can cascade beyond the model boundary. The governance question becomes how much execution authority the system is allowed to hold, and under what controls. Practitioners should review AI systems as potential action executors, not just content generators.
Runtime AI security creates a new identity and access problem around the model itself. The model may not be a person or a workload in the traditional sense, but once it can trigger actions, it needs explicit guardrails around tool scope, session context, and downstream authority. That makes AI governance converge with identity governance. Practitioners should align AI security oversight with access control, not leave it isolated in the data science stack.
Enterprise AI programmes will keep collapsing the old boundary between development and production. The interview shows a market in which testing, monitoring, and operational control are becoming one problem. That shifts security ownership toward teams that can govern behavior after deployment, not just evaluate artifacts before release. Practitioners should expect AI assurance to be measured by live containment, not static certification.
From our research:
- 85% of organisations lack full visibility into third-party vendors connected via OAuth apps, according to The State of Non-Human Identity Security.
- In the same study, only 1.5 out of 10 organisations are highly confident in their ability to secure NHIs, which shows how weak the governance baseline remains.
- For teams comparing AI runtime control with broader identity control, the next step is Ultimate Guide to NHIs , Why NHI Security Matters Now.
What this signals
Runtime control is becoming the real boundary for AI security. As AI systems move from offline evaluation into production execution, the question shifts from whether a model can be tested to whether its live behavior can be contained. That is a governance change, not just a tooling change, and it pushes security teams to align AI oversight with access control, logging, and incident response.
Identity teams should expect AI systems to inherit the same governance pressure as other non-human actors. Once an AI system can call tools or trigger downstream workflows, it starts to look less like a model and more like an identity with authority. The practical response is to decide which parts of identity governance apply to the system before the first production workflow is exposed.
AI programmes will increasingly fail at the handoff between data science and security operations. The organisations that do best will be the ones that treat model assurance, runtime monitoring, and access governance as one continuous programme. That is where the control model is heading, and it is where practitioner ownership needs to settle.
For practitioners
- Separate pre-deployment testing from runtime control Use red teaming to find weaknesses before launch, but keep runtime monitoring in place for live input abuse, misuse, and unexpected model behavior.
- Map tool reach for any AI system that can act Document every downstream system, API, or workflow an AI system can influence, then restrict that reach to the minimum necessary for the task.
- Treat agentic AI as an access governance issue Apply approval, logging, and scope boundaries to AI systems that can chain actions so execution authority is explicit and reviewable.
- Review live AI controls as part of security governance Bring AI runtime oversight into the same governance cadence you use for access control, risk review, and incident response.
Key takeaways
- The article shows that AI security has moved beyond model testing into runtime defense and enterprise governance.
- The central risk is not only model error, but the ability of adversarial inputs to create real operational failure after deployment.
- Practitioners should govern AI systems as live actors with scoped authority, continuous monitoring, and clear downstream limits.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | AG-02 | Adversarial inputs and tool-use risk are central to the article’s runtime AI security theme. |
| NIST AI RMF | The article is about governing AI risk across the lifecycle, not just model testing. | |
| NIST CSF 2.0 | PR.AC-4 | Runtime AI control depends on access scope, logging, and reviewability. |
Apply AI RMF governance to align model assurance, monitoring, and accountability.
Key terms
- Adversarial Machine Learning: Adversarial machine learning is the practice of manipulating an AI system so it produces an incorrect or unsafe result. In operational settings, the concern is not just model accuracy, but whether crafted inputs can change behavior after deployment and create downstream business or security impact.
- Runtime Defense: Runtime defense is the set of controls that observe and constrain an AI system while it is live. It sits after build-time testing and is meant to catch harmful inputs, unsafe outputs, and unexpected behavior once the system is interacting with real users or workflows.
- Agentic AI: Agentic AI is AI that can select actions, tools, and timing with a degree of runtime independence. In governance terms, it matters because the system is no longer only predicting or classifying. It is making execution decisions that can affect other systems and identities.
- Model Red Teaming: Model red teaming is the process of trying to break an AI system before deployment by probing for unsafe, biased, or exploitable behavior. It is a pre-release assurance activity, but it does not replace live monitoring because real production inputs and workflows are harder to simulate.
What's in the full article
TROJ.AI's full article covers the interview detail this post intentionally leaves for the source:
- James Stewart’s origin story from computer vision to AI security, including the specific turning points that changed his view of adversarial ML.
- The company’s view of how NIST, OWASP, MITRE ATLAS, and CSA shaped the AI security landscape.
- The split between TrojAI Detect and TrojAI Defend, including how the vendor positions build-time red teaming versus runtime defense.
- Examples of enterprise deployment scale, including AI environments with broad application coverage and audit scrutiny.
Deepen your knowledge
NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or programme maturity, it is worth exploring.
Published by the NHIMG editorial team on 2025-10-15.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org