What do organisations get wrong about trusted AI platforms?

Why This Matters for Security Teams

Trusted AI platforms are often marketed as if trust can be inherited from the vendor, the model, or a dashboard score. That framing misses the operational reality: if an AI platform can execute actions, call tools, and move data, it becomes part of the organisation’s control plane and must be governed like one. Current guidance increasingly points to runtime proof, not branding, as the basis for trust, which aligns with the NIST Cybersecurity Framework 2.0 emphasis on governance and control outcomes.

The biggest mistake is treating “trusted” as a procurement label instead of an evidence problem. Security teams need to know who or what acted, which secrets or identities were used, what context was present, and whether policy actually constrained the action at execution time. That is especially important when platforms process sensitive prompts, route data to external tools, or let agents act on behalf of users. Organisations that skip this distinction tend to discover exposure after a platform has already touched production data, not during the trust review itself. In practice, many security teams encounter the gap only after an AI workflow has already crossed a boundary that was never explicitly approved.

How It Works in Practice

A trustworthy AI platform is one that can enforce and prove control at runtime. That means the platform should bind actions to a workload identity, issue short-lived credentials only when a task is authorised, and log the exact policy decision that allowed the request. This is not the same as simply authenticating a user once and then letting the platform operate freely. For agentic systems, the right question is not “was the platform approved?” but “was this specific action authorised, by which identity, under which conditions?”

Practitioners should expect strong separation between identity, policy, and execution. In mature designs, the platform uses:

Workload identity for the agent or service, rather than shared API keys.

Context-aware authorisation that evaluates intent, data sensitivity, and destination at request time.

JIT secrets and ephemeral tokens that expire after the task finishes.

Policy-as-code so approvals are repeatable, testable, and auditable.

This is where NHI governance and AI platform governance converge. The NHI view of exposed credentials remains highly relevant, and NHIMG’s LLMjacking: How Attackers Hijack AI Using Compromised NHIs highlights how quickly exposed credentials can be abused. The same principle appears in platform design: if a “trusted” AI system can keep long-lived secrets, it can become a high-speed abuse path rather than a controlled service. For identity primitives, teams should evaluate patterns such as SPIFFE or OIDC-backed workload tokens, alongside runtime policy engines like OPA or Cedar. These controls tend to break down when the platform is allowed to chain tools across environments without per-step authorisation because the trust boundary disappears between steps.

Common Variations and Edge Cases

Tighter runtime control often increases integration overhead, requiring organisations to balance execution safety against developer friction and operational latency. That tradeoff becomes more visible in multi-agent systems, where one agent may delegate to another, or in platforms that mediate access to many internal tools. Best practice is evolving, but there is no universal standard for this yet, so security teams should avoid assuming that any single “trust” feature solves the problem.

One common edge case is internal deployment. Teams assume a platform is trustworthy because it runs inside a private cloud or on-premises environment. That assumption is weak if the platform can still access broad datasets, long-lived service accounts, or external plugins. Another edge case is vendor-managed telemetry: dashboards may show safety scores or policy summaries, yet not expose the underlying decision trail needed for audit or incident response. NHIMG’s McKinsey AI platform breach and DeepSeek breach show why trust claims collapse when data handling, credential scope, and execution controls are not independently verifiable. The practical standard is simple: if the platform cannot prove who acted, what it touched, and which policy applied, then it is not yet trustworthy in an operational sense.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10	A03	Covers insecure agent actions and tool use, central to trusted AI platform failures.
CSA MAESTRO	MAESTRO-3	Addresses runtime governance and control of autonomous AI platform behaviour.
NIST AI RMF		Govern and map AI risks to measurable controls instead of trust labels.

Require per-action authorization and log every tool call with the acting agent identity.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What do organisations get wrong about trusted AI platforms?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group