Subscribe to the Non-Human & AI Identity Journal
Home FAQ Agentic AI & Autonomous Identity Who is accountable when an AI assistant follows…
Agentic AI & Autonomous Identity

Who is accountable when an AI assistant follows malicious repository instructions?

← Back to all FAQ
By NHI Mgmt Group Editorial Team Updated June 9, 2026 Domain: Agentic AI & Autonomous Identity

Accountability sits with the organisation that allowed mutable instructions to act as standing authority without governance. If a project file can change agent behaviour, then ownership of that file, its review process, and its execution scope must be defined. Without that, the organisation has delegated security decisions to uncontrolled context.

Why This Matters for Security Teams

An AI assistant that follows malicious repository instructions is not merely “misbehaving”; it is executing untrusted content with whatever authority the surrounding system granted it. That shifts the problem from prompt quality to control design. If repository files, READMEs, package metadata, or issue comments can alter behaviour, then the organisation has created a path where content becomes authority without review, provenance, or scoped approval.

This is why the issue sits at the intersection of software supply chain risk, NHI governance, and agentic ai security. The same pattern appears in real-world compromise chains where attacker-controlled content manipulates automation, as seen in NHIMG coverage of the GitLocker GitHub extortion campaign and the LLMjacking research from Entro Security. For broader program framing, NIST Cybersecurity Framework 2.0 is useful because it ties governance, asset management, and protective controls together rather than treating AI as a special exception.

In practice, many security teams encounter this only after an assistant has already read, summarised, or acted on poisoned instructions rather than through intentional design review.

How It Works in Practice

The accountable party is usually the organisation that allowed mutable instructions to influence execution without clear boundaries. In agentic systems, the danger is not only the model output; it is the assistant’s ability to chain tools, retrieve context, and execute steps based on repository content that may have been changed by an attacker, a compromised contributor, or an over-permissioned internal user. Static RBAC alone does not solve this when the behaviour is driven by dynamic context.

Current guidance suggests treating repository instructions as untrusted input until they are signed, reviewed, and mapped to an approved execution scope. That means defining who owns instruction files, who can modify them, which agents may consume them, and what actions are allowed if the file changes. A practical control set usually includes:

  • Workload identity for the assistant, so the system knows what it is before it can act.
  • Just-in-time credentials and short-lived tokens, so repository-triggered actions do not inherit standing authority.
  • Runtime policy checks for each tool call, rather than trusting a one-time approval at startup.
  • Provenance controls for repository content, including review gates for files that can alter agent behaviour.

This maps cleanly to NHI management principles because the assistant is effectively a non-human identity acting on behalf of the organisation. The DeepSeek breach analysis is a useful reminder that exposed secrets and unsafe data handling often amplify the blast radius once automation is in play. For implementation, NIST Cybersecurity Framework 2.0 supports the governance logic, while the practical takeaway is to bind repository-driven instructions to explicit policy rather than implicit trust.

These controls tend to break down in environments where assistants can reach production, pull secrets from shared stores, and execute commands across multiple tools without request-time authorization.

Common Variations and Edge Cases

Tighter control over repository instructions often increases review overhead and slows automation, requiring organisations to balance fast delivery against behavioural safety. That tradeoff becomes sharper in multi-repo platforms, open-source contributor models, and internal developer portals where many people can influence the same instruction surface. There is no universal standard for this yet, so best practice is evolving.

One edge case is a benign-but-broken repository change that alters the assistant’s behaviour without malicious intent. Another is a trusted maintainer account that is later compromised, turning an approved workflow into an attack path. In both cases, the root issue is the same: mutable instructions have been granted standing authority. A related failure mode is overbroad “helper” access, where the assistant can read secrets, open pull requests, and run scripts even when the task only required a summary.

For that reason, organisations should distinguish between content that informs the assistant and content that governs the assistant. That distinction is central to Emerald Whale breach lessons, where exposed or abused access can convert routine automation into an attacker’s staging ground. The practical rule is simple: if a file can change execution, it deserves the same review discipline as a privileged policy change.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

FrameworkControl / ReferenceRelevance
OWASP Agentic AI Top 10A2Addresses unsafe tool use and instruction injection in autonomous assistants.
CSA MAESTROGOV-3Covers governance for agent actions, approval boundaries, and accountability.
NIST AI RMFSupports governance and accountability for AI behaviour under operational risk.

Constrain tool access and treat repository instructions as untrusted input at runtime.

NHIMG Editorial Note
Reviewed and updated by the NHIMG editorial team on June 9, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org