Should organisations prioritise data awareness over manual tagging?

Why This Matters for Security Teams

Manual tagging looks tidy on paper, but it breaks down when data moves across shared drives, SaaS apps, tickets, pipelines, and AI workflows faster than people can classify it. Data awareness is more useful because it helps security teams understand context, relationships, and potential misuse rather than relying on a label that may be stale, incomplete, or missing altogether. NHI Mgmt Group reports that only 5.7% of organisations have full visibility into their service accounts in the Ultimate Guide to NHIs — Key Research and Survey Results, which is a useful reminder that visibility gaps are usually broader than tagging gaps alone.

That matters because teams often assume the label is the control, when in practice the label is just metadata. If the organisation cannot see where the data is, who can reach it, or how it is being reused by systems and agents, tagging becomes an administrative task rather than a security control. The NIST Cybersecurity Framework 2.0 places emphasis on governance, identification, and protection outcomes, which aligns better with data awareness than with manual classification alone. In practice, many security teams encounter exposure only after a collaboration sprawl or AI ingestion event has already spread the data beyond the original owner’s intent.

How It Works in Practice

Data awareness is the operational discipline of enriching data with context from its environment: origin, sensitivity, business process, ownership, access paths, downstream consumers, and observed behaviour. Manual tagging can still help with edge cases, but current guidance suggests it should be a supporting input, not the primary dependency. A workable model combines discovery, classification, policy enforcement, and runtime monitoring so that controls follow the data as it moves.

In practice, security teams often combine several signals:

Content inspection to identify regulated or sensitive material.

Metadata analysis to infer business context and stewardship.

Access telemetry to see which users, services, or agents touch the data.

Policy-as-code to enforce rules dynamically based on context, not just labels.

Exception handling for records that need human review or legally mandated classification.

This approach is especially important when data is reused by AI systems, because those systems may summarise, copy, transform, or redistribute information in ways that manual tagging never anticipated. For that reason, organisations should treat classification as one signal inside a broader data-awareness program, not as a substitute for it. The NHIMG research base on visibility into non-human identities reinforces the same lesson: if the organisation cannot reliably see the actors and pathways involved, static labels will not prevent misuse. This guidance tends to break down in highly unstructured environments where data is copied into chat tools, local files, and ad hoc AI prompts because the original label does not travel with the content.

Common Variations and Edge Cases

Tighter data-aware controls often increase operational overhead, requiring organisations to balance stronger decision-making against speed, usability, and compliance workload. That tradeoff is real, especially where business teams expect immediate sharing and where legacy repositories lack reliable metadata.

There is no universal standard for how much automation should replace manual tagging yet. Best practice is evolving toward a hybrid model: use manual tagging for exceptional legal, contractual, or high-risk records, and use automated awareness for everything else. This is especially relevant for environments with distributed collaboration, third-party sharing, and AI retrieval because manual labels tend to lag behind actual usage.

Edge cases include scanned documents, image-based records, and opaque legacy systems where content detection is weak. In those environments, manual tagging may still carry more weight, but only as a temporary compensating control. For broader governance, the stronger pattern is to align with outcome-based frameworks such as NIST CSF 2.0 and to keep a visible record of how data is discovered, classified, accessed, and revoked. If an organisation relies on manual tags alone, the control works best only when people behave perfectly, which is not how shared data environments actually operate.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST CSF 2.0 set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
NIST CSF 2.0	ID.AM	Asset and data awareness depends on knowing what exists and where it flows.
NIST CSF 2.0	PR.DS	Protecting data requires controls based on context, not labels alone.
OWASP Non-Human Identity Top 10	NHI-01	Data awareness is critical when NHIs access or move sensitive data without manual review.

Map sensitive data and its movement under ID.AM, then automate discovery to keep inventories current.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Should organisations prioritise data awareness over manual tagging?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group