RTK project filters exposed a new AI review trust gap

By NHI Mgmt Group Editorial TeamPublished 2026-05-20Domain: Breaches & IncidentsSource: Pillar Security

TL;DR: RTK’s project-local filters let repository content control what Claude Code could see, creating a medium-severity path to hide backdoors and suppress scanner output before review, according to Pillar Security. The deeper issue is trust laundering: if the observation layer is attacker-shaped, AI-assisted code review cannot be treated as a reliable control.

At a glance

What this is: A medium-severity RTK flaw let repository-supplied filters hide code and security findings from AI coding assistants before review.

Why it matters: It matters because teams using AI-assisted development need to govern the tooling path into the model, not just the model’s outputs, across NHI, autonomous, and human review workflows.

By the numbers:

RTK’s maintainers responded within 24 hours and shipped a fix.

👉 Read Pillar Security's analysis of untrusted project-local filters in RTK

Context

RTK is a preprocessing layer for AI coding assistants, so it sits between repository content and what the model can actually observe. The security problem is not execution control, but perceptual control, because attacker-supplied project files could change what a reviewer saw without any approval step.

That matters for AI-assisted development because review quality depends on the integrity of the full path from source code to model context. When a repository can influence the filters that shape AI visibility, the trust boundary has moved inside the development workflow itself, which is why this is an NHI and agentic tool governance problem as much as an application security problem.

Key questions

Q: What breaks when project-local AI filters load automatically from a repository?

A: Automatic loading breaks the assumption that the reviewer sees the same evidence the repository contains. An attacker can hide malicious lines from diffs, security scanner output, or source files before the model sees them. The result is false confidence in a clean review and a path for compromised code to advance toward production.

Q: Why do repository-supplied filters create trust problems for AI coding assistants?

A: They let untrusted content inherit trusted authority simply because it is stored in the expected project path. That matters when the tool changes what the assistant can observe, because the model cannot flag evidence it never receives. The control failure is origin blindness, not regex syntax.

Q: How do security teams know whether AI review outputs are actually trustworthy?

A: Teams need to validate the integrity of the entire observation chain, from repository files to the model’s context window. If any preprocessing layer can remove evidence without review or provenance checks, a clean answer may only mean the input was filtered. Compare AI output against raw artefacts where possible.

Q: Who is accountable when untrusted project configuration changes what an AI assistant sees?

A: Accountability sits with the team that allowed repository-local configuration to auto-load without explicit trust controls. Governance frameworks such as OWASP NHI and zero trust both point to the same principle: provenance, review, and revocation must be part of the control design, not an afterthought.

Technical breakdown

Project-local filters and trust laundering

RTK loaded filters from the repository before output reached the assistant, and it did so without distinguishing between user-owned configuration and project-owned configuration. That created trust laundering: untrusted content inherited trusted authority simply because it arrived in a file with the expected name and shape. The danger is not the syntax of TOML or regex itself, but the fact that the filter engine operated below the model’s observation layer. Once that happened, the assistant could only reason about what survived filtering, not what had been removed.

Practical implication: treat repository-supplied preprocessing files as security-relevant inputs and verify their provenance before they influence model visibility.

Observation-layer manipulation in AI-assisted review

The core failure is that the tool changed the evidence before the reviewer, human or AI, ever saw it. A backdoor could be hidden in a code diff, a scanner warning could be stripped from output, and the model would still produce a clean assessment because its input stream was already altered. This is a distinct class of risk from prompt injection or execution abuse. The attacker is not steering the model’s action, but controlling the truth set the model uses to decide.

Practical implication: security review pipelines need integrity checks on preprocessing and transformation layers, not just on prompts, policies, or execution sandboxes.

Hash-based trust and explicit opt-in for untrusted config

The patch shifted RTK from implicit trust to explicit trust by requiring repository filters to be reviewed, hashed, and enabled on purpose. Hash verification also closes the time-of-check to time-of-use gap, because trust is revoked if the file changes after approval. That design matters because it ties trust to content identity and review state rather than to repository location. For AI-assisted workflows, that is the difference between assumed visibility and verified visibility.

Practical implication: require explicit trust enrollment and integrity binding for any project-local configuration that can shape AI output.

Threat narrative

Attacker objective: The attacker aimed to hide malicious code and suppress security findings so compromised software could pass AI-assisted review and ship unchanged.

Entry occurred when an attacker committed a malicious .rtk/filters.toml file into a repository that developers would later clone and use with Claude Code.
Credential access was not the issue here; the attacker gained control of the AI’s observation path by planting regex filters that removed selected lines from command output and diffs.
Impact followed when the assistant reviewed hidden backdoors or scanner warnings as if they were absent, allowing compromised code to move toward production with attacker-planted flaws intact.

ASP.NET machine keys RCE attack — 3,000+ exposed ASP.NET machine keys enabled remote code execution.
DeepSeek breach — DeepSeek breach exposed 1M+ log lines and sensitive secret keys.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Project-local AI filters are an observation control, not a convenience feature. RTK showed that anything sitting between repository output and model context becomes part of the security boundary. When that layer can be controlled by repository content, the assistant is no longer reviewing the codebase directly. Practitioners should treat preprocessing, filtering, and transformation rules as governance objects, not developer preferences.

Trust laundering is the named failure mode this incident exposes. Content from a git remote was granted the same authority as configuration a user installed and reviewed locally. That assumption was designed for stable, user-owned settings, not attacker-shaped repository state. The implication is that AI review pipelines need to stop assuming origin-neutral trust for project-local files.

AI-assisted code review inherits the trustworthiness of every layer that shapes model visibility. The assistant can only flag what survives the pipeline into its context window, which means filtered evidence produces filtered judgement. This is not a model reasoning failure. It is a governance failure in the observation chain, and teams should rethink how they certify evidence before certifying code.

Perceptual authority is now an NHI control surface. The same governance logic that protects secrets, tokens, and workload identities now applies to the files that decide what an AI agent can see. OWASP NHI and zero trust thinking both point in the same direction: verify provenance, bound trust, and assume repository-local inputs may be hostile until proven otherwise.

Runtime trust decisions need to be explicit, revocable, and content-bound. RTK’s hash-based opt-in model shows the right direction for any AI tool that consumes project-local configuration. Practitioners should stop assuming that a file is safe because it looks routine, and start asking whether its authority has been intentionally granted and continuously preserved.

From our research:
98% of companies plan to deploy even more AI agents within the next 12 months, despite documented rogue behaviour in 80% of current deployments, according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
That visibility gap is why the OWASP Agentic AI Top 10 matters: governance has to account for what the agent can actually see and do.

What this signals

Trust laundering is the pattern practitioners should now watch for in AI-assisted development. Any file or hook that changes model visibility can become a control point, which means code review, scanner output, and assistant prompts all need provenance checks before they are treated as evidence.

With 33% of organisations reporting AI agents have accessed inappropriate or sensitive data beyond their intended scope, the governance problem is already operational rather than theoretical, according to AI Agents: The New Attack Surface report. Teams that rely on clean model output without validating the path into that output are building on a false assurance layer.

The next programme shift is to treat AI review tooling as part of the identity stack. That means binding trust to source, reviewing the configuration that shapes model perception, and using the Top 10 NHI Issues to pressure-test where untrusted inputs can masquerade as authoritative state.

For practitioners

Inventory AI preprocessing layers Map every filter, hook, plugin, and configuration file that can change what an AI assistant sees before it reviews code or scanner output. Classify those components as security-relevant inputs, not developer convenience settings.
Require explicit trust for repository-supplied config Block automatic loading of project-local filter files unless a user has reviewed the exact contents and opted in for that repository. Tie trust to the file hash so changes after approval revoke access automatically.
Verify the review path, not just the model answer When a clean AI review conflicts with code smell, scanner output, or diff context, inspect the transformation layer that fed the model. A clean output is not trustworthy if the input stream was pre-shaped by untrusted content.
Apply OWASP NHI controls to AI tooling state Treat model-adjacent configuration as part of the non-human identity control surface, alongside secrets, tokens, and service accounts. Use the OWASP NHI Top 10 to frame provenance, trust, and lifecycle questions around these files.

Key takeaways

RTK exposed a new class of AI-assisted development risk where repository content can alter what the assistant is able to see.
The measurable impact was practical and immediate: attacker-planted filters could hide backdoors and suppress scanner output in a workflow used by tools with 50,000+ GitHub stars.
The control that matters most is explicit, revocable trust for project-local configuration that shapes model visibility.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST Zero Trust (SP 800-207) and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-03	Repository filters changed AI-visible output through untrusted configuration.
NIST Zero Trust (SP 800-207)	PR.AC-1	The incident shows why origin and authority must be verified before access to evidence is granted.
NIST CSF 2.0	PR.DS-6	Protected integrity of evidence is central when tooling can modify code and scanner outputs before review.

Apply zero trust to preprocessing layers and verify provenance before any tool can alter what the assistant sees.

Key terms

Trust Laundering: Trust laundering is when untrusted content gains trusted authority simply by passing through a tool that assumes the source is safe. In AI-assisted development, that can happen when repository files or hooks silently shape what the model sees, turning evidence selection into a security control.
Observation Layer: The observation layer is the part of an AI workflow that determines what information reaches the model for reasoning and review. It includes preprocessors, filters, hooks, and connectors. If this layer is compromised, the model may appear accurate while operating on a deliberately incomplete view.
Project-Local Configuration: Project-local configuration is settings stored in a repository and loaded automatically by tooling during work on that codebase. It is useful for shared behaviour, but it becomes risky when loaded without provenance checks, because attacker-controlled settings can inherit authority they were never meant to have.
Perceptual Authority: Perceptual authority is the power to decide what an AI or reviewer is allowed to see before making a judgement. Unlike execution authority, it does not run code, but it can still shape decisions by suppressing warnings, diffs, or malicious lines from the evidence stream.

Deepen your knowledge

AI-assisted code review and non-human identity governance are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If your team is dealing with repository-supplied AI tooling state, this is a practical place to start.

This post draws on content published by Pillar Security: Untrusted Project-Local Filters in RTK, a review of how repository filters can hide evidence from AI coding assistants. Read the original.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-20.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org