Subscribe to the Non-Human & AI Identity Journal

Model Poisoning

Model poisoning is the deliberate corruption of training or ingestion data so an AI system learns or behaves incorrectly. The attack targets the trust chain before or during model training, which means the resulting failure can persist into production outputs, decisions, and automated actions.

Expanded Definition

Model poisoning is a data integrity attack on the learning pipeline. It can target training corpora, fine-tuning sets, retrieval stores, labeling workflows, or continuous ingestion paths so the model internalizes malicious patterns, biased decisions, or hidden triggers. In practice, the term is used most often in machine learning and agentic systems where autonomous behavior depends on upstream data quality. Standards language is still evolving, so definitions vary across vendors, but the common thread is the same: an attacker compromises the model before deployment by corrupting what the model is trained on, tuned on, or allowed to learn from. That makes model poisoning distinct from prompt injection, which targets runtime instructions, and from model extraction, which targets reverse engineering of outputs. NIST’s NIST Cybersecurity Framework 2.0 is useful here because it frames the problem as a governance and integrity issue across identify, protect, detect, respond, and recover functions. The most common misapplication is treating model poisoning as a purely data science problem, which occurs when security teams fail to protect ingestion controls and provenance checks.

Examples and Use Cases

Implementing poisoning defenses rigorously often introduces workflow friction, requiring organisations to weigh faster model iteration against stronger review, lineage, and access control.

  • A vendor submits fine-tuning data with subtly altered labels so a fraud model learns to miss transactions from a targeted region.
  • A malicious actor inserts poisoned records into a feedback loop, causing an AI agent to repeat unsafe tool choices after deployment.
  • A compromised data pipeline feeds corrupted documents into a retrieval system, degrading the answers that downstream models produce.
  • A security team reviews a new LLM training set against the guidance in Ultimate Guide to NHIs because service accounts, API keys, and automated ingestion jobs often own the data path that attackers abuse.
  • An enterprise maps model supply-chain controls to NIST Cybersecurity Framework 2.0 to ensure that data provenance, change control, and monitoring are part of the model lifecycle, not an afterthought.

These use cases are especially important where models are retrained continuously or where an Ultimate Guide to NHIs-style governance model is needed to control the non-human identities that can write into datasets, feature stores, or vector databases. Industry usage is still evolving, so some teams include label corruption and backdoor triggers under poisoning while others reserve the term for training-time contamination only.

Why It Matters in NHI Security

Model poisoning matters in NHI security because the systems that move data into and out of models are usually driven by non-human identities, not people. When those identities are overprivileged, unrotated, or poorly visible, attackers can corrupt the model indirectly by compromising the pipelines that feed it. That is why NHI governance belongs in the model risk conversation, not just in IAM. NHI Mgmt Group research shows that 80% of identity breaches involved compromised non-human identities such as service accounts and API keys, which makes poisoned ingestion paths a realistic operational risk rather than a theoretical one. The same research also reports that 97% of NHIs carry excessive privileges, amplifying the blast radius when a poisoned data source or compromised automation account is accepted as trusted. The Ultimate Guide to NHIs is the clearest reference point for understanding why least privilege, rotation, and visibility are prerequisites for trustworthy AI pipelines. Organisational controls should also align with NIST Cybersecurity Framework 2.0 so that provenance, monitoring, and recovery are managed continuously. Organisations typically encounter model poisoning only after a model starts making wrong decisions or an agent executes unsafe actions, at which point the term becomes operationally unavoidable to address.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

MITRE ATLAS and OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework Control / Reference Relevance
MITRE ATLAS AML.TA0002 Covers adversarial ML techniques that include poisoning during data or model lifecycle stages.
NIST AI RMF Treats poisoned data as a trust and validity failure in AI lifecycle risk management.
OWASP Agentic AI Top 10 A07 Agentic systems inherit poisoned data risks when tools, memory, or retrieval sources are compromised.

Check training and ingestion paths for poisoning vectors and validate provenance before model updates.