TL;DR: Enterprises are sitting on thousands of hours of video that remain effectively unsearchable, while AI-native platforms now promise multimodal extraction across speech, slides, visuals, and context, according to WorkOS's interview with Here founder Mazy Dar. The governance issue is not whether video can be indexed, but whether identity, access, and audit controls can keep pace once video becomes a first-class data source.
NHIMG editorial — based on content published by WorkOS: Mazy Dar on building the future of video understanding at here
Questions worth separating out
Q: How should enterprises govern AI systems that make video content searchable?
A: Start by treating searchable video as governed knowledge, not passive storage.
Q: Why do multimodal video platforms create new IAM and audit risks?
A: They create new artefacts that do not exist in a plain file share model: searchable moments, extracted quotes, and visual context attached to a queryable index.
Q: What do security teams get wrong about transcription versus video understanding?
A: They often assume transcription is enough because it turns speech into text.
Practitioner guidance
- Classify video content before indexing it Apply sensitivity labels to recordings, meeting archives, and training libraries before they enter any multimodal search workflow.
- Scope access to searchable outputs, not just source files Review who can query, export, and share extracted moments from video in addition to who can open the original recording.
- Instrument audit trails around retrieval and reuse Log the search queries, retrieved clips, shared snippets, and workflow destinations associated with video understanding systems.
What's in the full article
WorkOS's full interview covers the operational detail this post intentionally leaves for the source:
- The engineering trade-offs behind multimodal inference at scale, including latency and cost considerations
- The product and workflow integrations needed to make video moments usable inside daily enterprise operations
- The reliability and security requirements discussed for enterprise buyers, including access controls and audit logging
- The broader conversation with Mazy Dar at HumanX 2026 on building AI-native video understanding
👉 Read WorkOS's interview on AI-native video understanding and enterprise search →
AI-native video understanding: what it means for enterprise governance?
Explore further
Searchable video turns unstructured media into governed enterprise knowledge. That changes the identity problem because access is no longer limited to viewing a recording. Users can query, extract, and redistribute moments from meetings, demos, and calls, which means the control plane must govern both the original object and its derived fragments. The practitioner conclusion is that media search is an access-governance problem as much as it is an AI problem.
A few things that frame the scale:
- The average estimated time to remediate a leaked secret is 27 days, despite 75% of organisations expressing strong confidence in their secrets management capabilities, according to The State of Secrets in AppSec.
- Organisations maintain an average of 6 distinct secrets manager instances, creating fragmentation that undermines centralised control.
A question worth separating out:
Q: How can organisations decide whether video search is ready for production use?
A: Use three checks: latency that fits daily workflows, accuracy high enough for trust, and controls that cover access, data residency, and audit logging. If any one of those fails, the platform may be useful for experimentation but not yet ready to carry regulated or sensitive enterprise content.
👉 Read our full editorial: AI-native video understanding exposes a new enterprise data governance gap