TL;DR: ShadowLogic backdoors can be embedded in trusted model formats such as ONNX and TensorRT, survive conversion, and remain effective even after downstream fine-tuning, according to HiddenLayer, which makes model supply chain trust harder to justify. Persistent logic inside the graph turns model provenance into an access-control problem, not just a model-quality issue.
NHIMG editorial — based on content published by HiddenLayer: Persistent Backdoors
Questions worth separating out
Q: How should security teams govern machine learning models that may contain hidden backdoors?
A: Security teams should govern machine learning models as controlled artifacts, not as passive files.
Q: Why is model conversion risky when the source artifact may be tampered with?
A: Conversion is risky because it preserves structure, not trust.
Q: What do security teams get wrong about fine-tuning compromised models?
A: They often assume fine-tuning will wash out prior compromise, but that only applies when the issue lives in the learned weights.
Practitioner guidance
- Inspect exported model graphs before approval Review ONNX, TensorRT, and other deployed artifacts for unexpected conditional branches, output substitution nodes, and trigger-detection logic before they are promoted to production.
- Require signed provenance across the model pipeline Enforce artifact signing and custody tracking from training output through conversion and deployment so a tampered model cannot enter production unnoticed.
- Treat fine-tuning as a performance step, not a cleanse step Assume downstream tuning may improve accuracy while leaving hidden graph logic intact, so separate remediation from retraining in your workflow.
What's in the full report
HiddenLayer's full research covers the operational detail this post intentionally leaves for the source:
- Side-by-side model graph examples showing how the backdoor logic is embedded in ONNX and TensorRT artifacts.
- The full efficacy table comparing base, ShadowLogic, and fine-tuned backdoor performance across conversion steps.
- Additional visual evidence of the trigger path and the model branches that preserve malicious output selection.
- The conversion and retraining sequence used to test whether the backdoor survives downstream lifecycle changes.
👉 Read HiddenLayer's research on persistent ShadowLogic backdoors in AI models →
ShadowLogic backdoors and model supply chain risk: what changes now?
Explore further
Model supply chain integrity has become an identity control problem. Once a model artifact can carry hidden execution logic, approval is no longer just about provenance labels or checksum validation. The real question is who can alter the graph, who can attest to what is inside it, and whether downstream systems can detect hidden decision paths. Practitioners should treat model artifacts as governed assets with access boundaries, not inert files.
A few things that frame the scale:
- The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
- 43% of security professionals are concerned about AI systems learning and reproducing sensitive information patterns from codebases, according to The State of Secrets in AppSec.
A question worth separating out:
Q: How can organisations reduce the risk of malicious model supply chain attacks?
A: Organisations should combine provenance checks, artifact signing, graph inspection, and adversarial testing before models reach production. If a model is sourced externally or converted between formats, every transition should be treated as a new trust boundary. The aim is to prove what the model is, not just whether it performs acceptably on clean data.
👉 Read our full editorial: ShadowLogic backdoors expose persistent AI model supply chain risk