TL;DR: Cyera says an unauthenticated out-of-bounds heap read in Ollama, tracked as CVE-2026-7482 with CVSS 9.1, can expose prompts, system messages, environment variables, and other sensitive heap data through only three API calls on roughly 300,000 internet-facing instances. The deeper issue is that local AI runtimes can become high-value NHI exposure points when authentication, segmentation, and secret hygiene are missing.
NHIMG editorial — based on content published by Cyera: Bleeding Llama, a critical memory leak in Ollama
Questions worth separating out
Q: How should security teams secure internet-facing local AI inference servers?
A: Security teams should require authentication in front of every inference endpoint, remove public exposure where possible, and segment AI workloads from general-purpose networks.
Q: Why do local AI platforms increase NHI secret exposure risk?
A: They increase risk because prompts, system instructions, API keys, and environment variables can coexist in the same process memory.
Q: What breaks when AI runtimes are deployed without authentication?
A: Without authentication, the service becomes a reachable trust boundary rather than a controlled internal capability.
Practitioner guidance
- Patch and verify the fix immediately Apply the vendor-released remediation, then verify that tensor element counts are validated against actual buffer sizes before any quantization loop runs.
- Remove unauthenticated network exposure Place an authentication proxy or API gateway in front of every AI inference endpoint and block public access to default ports such as 11434.
- Rotate secrets from exposed hosts Assume environment variables, API keys, and tokens may have been resident in memory if the service was internet-facing.
With 85% of organisations lacking full visibility into third-party vendors connected via OAuth apps, according to The State of Non-Human Identity Security, the same visibility gap now applies to AI endpoints and the identities that feed them?
👉 Read Cyera's technical report on the Ollama memory leak and NHI exposure →
Explore further
Unauthenticated local AI runtimes create an identity problem before they create a memory problem. The headline vulnerability is a heap read, but the operational failure is broader: the platform is reachable without a trust gate. That means secrets, prompts, and agent outputs can be harvested from a service that never should have been exposed as a public endpoint. Practitioners should treat local inference systems as governed NHI infrastructure, not convenience tooling.
A few things that frame the scale:
- Only 1.5 out of 10 organisations are highly confident in their ability to secure NHIs, compared to nearly 1 in 4 for securing human identities, according to The State of Non-Human Identity Security.
- 85% of organisations lack full visibility into third-party vendors connected via OAuth apps, including 38% with no or low visibility, according to The State of Non-Human Identity Security.
A question worth separating out:
Q: What should teams do in the first 24 to 72 hours after exposure is found?
A: Contain the endpoint, apply the patch or block external access, and rotate any secrets that may have been loaded into memory. Then review logs, artifact exports, and agent integrations to determine whether prompts, tokens, or proprietary code were exposed before the fix was applied.
👉 Read our full editorial: Unauthenticated memory leaks in local AI platforms expose NHI data