AI data poisoning is an attack in which an adversary corrupts the data a model learns from so the model produces biased, unstable, or malicious outputs. The attack targets training-time integrity, not just inference-time behaviour, which makes provenance and dataset governance central controls.
Expanded Definition
AI data poisoning is best understood as a training-time integrity attack: an adversary manipulates datasets, labels, prompts, feedback, or fine-tuning corpora so the model internalises harmful patterns before deployment. That makes it different from prompt injection, which targets runtime behaviour, and from simple data quality errors, which are accidental rather than adversarial. In practice, the poisoned content may be subtle enough to survive routine validation, especially when it blends into large-scale collection pipelines, synthetic data generation, or human feedback loops. Definitions vary across vendors on whether poisoning must be intentional or whether any malicious data contamination qualifies, but the security concern is the same: the model learns the attacker’s influence as if it were legitimate signal. NIST’s NIST Cybersecurity Framework 2.0 reinforces the need for integrity controls that cover the full data lifecycle. The most common misapplication is treating poisoned training data as a mere model-performance issue, which occurs when teams investigate accuracy drift without tracing dataset provenance or ingestion paths.
Examples and Use Cases
Implementing poisoning defenses rigorously often introduces governance overhead, requiring organisations to weigh faster data onboarding against stronger provenance checks and review gates.
- A customer-support model is fine-tuned on ticket transcripts that include adversary-inserted phrases, causing it to recommend unsafe actions when similar wording appears later.
- An image classifier is trained on a public dataset where a small set of samples has been relabelled, creating a backdoor that triggers misclassification for specific patterns.
- A reinforcement learning workflow accepts human feedback from weakly vetted annotators, allowing repeated biased ratings to skew the reward signal over time.
- A code assistant learns from a poisoned repository mirror, which later causes it to reproduce insecure snippets as if they were trusted examples.
- An enterprise LLM ingests internal documents without dataset provenance controls, making it difficult to distinguish legitimate policy content from tampered inputs.
The DeepSeek breach is a useful reminder that model ecosystems often fail at the boundaries between data, secrets, and access control. NHIMG research also shows how quickly attackers move once they find exposed AI-adjacent credentials, and that speed compounds poisoning risk when ingestion pipelines or annotation platforms are reachable through compromised identities. For a broader NHI context, see the Ultimate Guide to NHIs — Key Research and Survey Results and the operational reality it documents around machine identity exposure.
Why It Matters in NHI Security
AI data poisoning matters to NHI security because the model supply chain is often governed by the same service accounts, API keys, and automation identities that control data collection and training jobs. If those NHIs are over-privileged, poorly rotated, or reused across systems, an attacker may not need to breach the model itself to corrupt its learning inputs. The result can be subtle but severe: biased decisions, unsafe recommendations, hidden backdoors, or persistent trust erosion in AI outputs. NHIMG research on secrets management shows that only 44% of developers follow security best practices for secrets management, while the average time to remediate a leaked secret is 27 days, conditions that give adversaries a wide window to tamper with data pipelines and training assets. The issue is not just data hygiene; it is governance over which identities can publish, label, sync, or approve training material. Practitioners should treat training data as a protected asset with provenance, access review, and tamper-evidence controls. Organisations typically encounter the full impact only after a model behaves unpredictably in production, at which point poisoning becomes operationally unavoidable to address.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | A03 | Covers training-data and tool-chain integrity risks that enable poisoned model behaviour. |
| NIST AI RMF | Addresses AI system validity, robustness, and traceability across the lifecycle. | |
| NIST CSF 2.0 | PR.DS-6 | Data integrity controls are directly relevant to detecting and preventing poisoning. |
Protect model inputs, feedback loops, and training pipelines from untrusted or tampered data sources.
Related resources from NHI Mgmt Group
Deepen Your Knowledge
Reviewed and updated by the NHIMG editorial team on June 24, 2026.
NHI Mgmt Group — the #1 independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org