Who is accountable when sensitive data is retained in a third-party AI tool?

Why This Matters for Security Teams

Third-party AI tools can become uncontrolled retention systems for sensitive prompts, files, and embedded secrets. The core issue is not just where the data is stored, but who can approve its use, prove its deletion, and detect downstream reuse. OWASP’s Non-Human Identity Top 10 is relevant here because AI tools often operate with credentials, connectors, and service tokens that outlive the business decision that created them.

NHIMG research shows how quickly sensitive material can be exposed once it enters an AI-adjacent ecosystem. The DeepSeek breach and the 52 NHI breaches Report both reinforce the same operational lesson: once a third party has visibility into sensitive data, trust alone is not a control. Accountability remains with the organisation that introduced the data into the tool and authorised the workflow.

Security teams often assume the vendor owns the risk because the vendor stores the content, but compliance obligations, deletion requests, access approvals, and legal exposure usually remain shared or retained by the customer. In practice, many security teams encounter retention issues only after a regulator, customer, or incident response team asks for proof of deletion rather than through intentional governance.

How It Works in Practice

Accountability should be assigned at the point where the organisation decides to use the tool, not when the provider finishes processing it. That means defining an internal owner for data classification, prompt approval, retention terms, and evidence collection. If the AI platform supports chat history, model improvement, training exclusions, or administrator export, each setting needs explicit review before sensitive data is allowed in.

Practitioners should treat AI tools like any other third-party processor, but with stricter attention to prompt content and derived artefacts. This is where the LiteLLM PyPI package breach and the Reviewdog GitHub Action supply chain attack matter: once an AI-adjacent integration is trusted, hidden data flows can spread across logs, caches, and connectors.

Classify what may be shared with the tool, including prompts, attachments, and retrieved context.

Map the vendor’s retention, backup, and deletion terms to an internal control owner.

Require evidence for deletion requests, not just contractual promises.

Review whether data can be used to train, tune, or improve vendor systems.

Limit connector scope so the tool cannot ingest more data than the task requires.

Current guidance suggests that deletion obligations should be validated operationally, not just legally, because many providers cannot prove lineage across backups, replicas, or model-derived stores. These controls tend to break down when the organisation enables broad workspace sharing or long-lived integrations because the vendor can no longer distinguish approved retention from accidental persistence.

Common Variations and Edge Cases

Tighter data controls often increase review overhead and reduce user convenience, requiring organisations to balance rapid AI adoption against legal, privacy, and incident-response constraints. There is no universal standard for this yet, especially where vendor terms, jurisdiction, and model architecture intersect.

Some environments allow low-risk public data in a third-party tool while prohibiting customer records, source code, or secrets. That split is reasonable, but only if the boundary is enforced in policy and tooling. For example, the State of Secrets in AppSec report from GitGuardian and CyberArk notes that 43% of security professionals are concerned about AI systems learning and reproducing sensitive information patterns from codebases, which makes prompt hygiene and redaction part of accountability, not just best effort.

Best practice is evolving for enterprise AI assistants, especially where administrators can disable retention while individual users still paste confidential material into prompts. In those cases, accountability is shared in the governance sense but remains operationally anchored to the organisation that permitted the workflow. If the provider cannot prove erasure or lineage, the organisation must still treat the exposure as its own compliance and privacy problem.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 and OWASP Agentic AI Top 10 address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-01	Third-party AI tools rely on non-human identities, tokens, and connectors that must be governed.
OWASP Agentic AI Top 10	AI-01	Agentic and AI tool misuse can retain or leak sensitive context through autonomous processing.
NIST AI RMF		AI RMF governance addresses accountability for data handling, oversight, and risk ownership.

Inventory every AI tool identity, then restrict and review its access before approving any sensitive workflow.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Who is accountable when sensitive data is retained in a third-party AI tool?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group