Training data poisoning is an attack that corrupts the data an AI model learns from so it produces attacker-influenced results later. The corruption may be inserted during training or at runtime, and the model can appear normal while embedding a hidden failure path in its outputs.
Expanded Definition
Training data poisoning is a machine learning integrity attack that alters the data a model learns from so the model internalises attacker-chosen patterns, labels, or correlations. In NHI and agentic AI environments, the risk is not limited to classic public datasets; it can also involve internal telemetry, prompt logs, fine-tuning corpora, or feedback loops that feed autonomous systems. Guidance varies across vendors on whether runtime data poisoning, feedback poisoning, and label contamination should be treated as separate classes, but the practical concern is the same: the model behaves normally during testing while preserving a hidden trigger or bias for later exploitation. NIST’s NIST Cybersecurity Framework 2.0 treats this as an integrity problem, so controls should focus on data provenance, validation, and change monitoring rather than model output alone.
The most common misapplication is assuming a clean model review proves training integrity, which occurs when teams validate only inference results and never inspect the upstream data pipeline.
Examples and Use Cases
Implementing poisoning defenses rigorously often introduces slower data onboarding and heavier review, requiring organisations to weigh model agility against stronger provenance controls.
- A malicious contributor inserts mislabeled examples into a fine-tuning set so an agent learns to treat a specific phrase as a safe command, even though it should be blocked.
- A vendor-supplied dataset contains subtle backdoor samples that cause the model to misclassify records once a trigger token appears in user input.
- Human feedback loops are manipulated so reinforcement data gradually rewards unsafe or biased outputs, creating model drift that is hard to trace after deployment.
- Telemetry used for continual learning is polluted with crafted events, causing the system to normalise anomalous behaviour and weaken downstream detection logic.
NHIMG’s analysis of DeepSeek breach shows how poisoned or exposed training material can coexist with broad secret leakage, turning a model supply chain issue into a wider security incident. That pattern is especially relevant when organisations use identity data, code snippets, or operational logs as training inputs. For broader context on the NHI attack surface, see the Ultimate Guide to NHIs — Key Research and Survey Results.
Why It Matters in NHI Security
Training data poisoning matters because agentic systems may turn a subtle training flaw into an operational decision flaw, especially when agents have tool access, delegated permissions, or access to NIST Cybersecurity Framework 2.0-aligned workflows. In NHI security, the impact is broader than model accuracy: poisoned learning data can cause an agent to mishandle secrets, weaken policy enforcement, or trust the wrong identity signals. NHIMG research shows that ultimate guide findings on NHIs consistently point to identity sprawl and weak governance as force multipliers for AI risk, while 43% of security professionals are already concerned that AI systems may learn and reproduce sensitive information patterns from codebases. The governance lesson is straightforward: once training inputs are shared across pipelines, the integrity of the data becomes part of identity security. Organisations typically encounter the consequence only after a model starts making unsafe decisions in production, at which point training data poisoning becomes operationally unavoidable to address.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
MITRE ATLAS and OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| MITRE ATLAS | Covers adversarial ML tactics including data poisoning and backdoors. | |
| NIST AI RMF | MAP | Addresses AI risk identification, including training data integrity threats. |
| OWASP Agentic AI Top 10 | LLM-06 | Agentic AI guidance flags training and prompt data corruption as a core abuse path. |
Document poisoning scenarios in AI risk assessments and assign owners for dataset provenance checks.