TL;DR: Cyera reports CVE-2026-7482 in Ollama, a CVSS 9.1 unauthenticated memory leak that can expose prompts, system prompts, and environment variables across an estimated 300,000 internet-facing servers. The issue turns local AI inference into an NHI exposure problem, not just an application bug.
At a glance
What this is: This is Cyera’s analysis of a critical Ollama vulnerability that can leak process memory and expose prompts, system prompts, and environment variables.
Why it matters: It matters because self-hosted AI runtimes can become high-value NHI targets when unauthenticated access, local memory handling, and secrets exposure intersect.
By the numbers:
- Cyera says CVE-2026-7482 carries a CVSS score of 9.1 and could affect 300,000 servers globally.
👉 Read Cyera's analysis of the Ollama memory leak and CVE-2026-7482
Context
Ollama memory leakage is an NHI governance problem because the model runtime can hold prompts, system prompts, and environment variables in process memory. When that runtime is exposed without authentication, the boundary between application data and operational secrets collapses, especially in environments where AI workloads are shared.
Cyera’s research shows a failure mode that is common to self-hosted AI stacks: a service built for local convenience becomes a path to unintended disclosure when input validation is weak and secrets are present in memory. For IAM and NHI teams, the question is not whether the model is local, but whether the runtime is protected like any other privileged workload.
Key questions
Q: How should security teams protect self-hosted AI runtimes from memory disclosure?
A: Security teams should place self-hosted AI runtimes behind authentication, network segmentation, and strict input validation. They should also assume process memory may contain prompts, secrets, and tool output, so runtime hardening must include secret minimisation, monitoring, and tested handling for malformed model files.
Q: Why do AI model servers create NHI governance risk even when deployed locally?
A: Local deployment does not remove NHI risk because the service still processes privileged inputs, stores operational context in memory, and may expose secrets through logs, dumps, or export paths. If the runtime is reachable without strong controls, it behaves like a privileged identity surface, not a benign development tool.
Q: What breaks when model file validation is weak in AI platforms?
A: Weak validation lets attackers manipulate tensor shapes, buffer lengths, or destination paths so the platform reads past memory boundaries or exports tainted artifacts. The practical failure is not only a crash. It can become silent disclosure of prompts, environment variables, and other sensitive runtime data.
Q: How should teams respond if an AI runtime may have leaked process memory?
A: Teams should isolate the service, revoke any secrets that may have been present in memory, inspect exported model artifacts, and rotate credentials that could have been exposed. They should also review related sessions, prompts, and tool integrations for secondary impact before returning the service to production.
Technical breakdown
How the Ollama memory leak turns model conversion into disclosure
The vulnerable path sits in model creation and conversion. Ollama accepts uploaded GGUF files, parses tensor metadata, then converts tensors through an intermediate F32 representation before writing the new model to disk. The flaw appears when the code trusts tensor shape metadata to determine how many elements to read. If an attacker supplies a GGUF file with an inflated shape, the conversion loop reads past the end of the buffer and pulls adjacent heap contents into the output. Because the data is later written back into a model file, the leak becomes durable and reusable rather than a transient crash.
Practical implication: Treat model import and conversion as attack surface, not just content handling.
Why unauthenticated AI runtimes expose more than prompts
The leaked heap can include far more than user prompts. In this case, the server process may also contain system prompts, environment variables, and other in-memory artefacts from concurrent usage. That matters because AI runtimes often aggregate multiple users, tools, and model sessions in a single process boundary. Once the process memory is exposed, the attacker does not need separate access to each tenant or conversation. The problem is amplified when the runtime is connected to toolchains that pass credentials, source code, or internal instructions into the model workflow.
Practical implication: Assume process memory in AI services may contain secrets and user data until proven otherwise.
How file validation failure becomes an exfiltration path
The research shows a second control gap: the application accepts attacker-controlled model names and can push a locally created model to a remote URI if the name is formatted as one. That means exploitation is not limited to reading memory on the target host. The attacker can convert the bad input into a transport mechanism and exfiltrate the resulting model artifact over the push workflow. In practice, weak validation at creation time combines with permissive output handling to create a complete disclosure path from crafted file to remote collection.
Practical implication: Validate model names, file structure, and outbound push destinations as separate trust boundaries.
Threat narrative
Attacker objective: The attacker’s objective is to extract prompts, system prompts, and environment variables from the Ollama process and exfiltrate them without authentication.
- Entry via a crafted GGUF file uploaded to the Ollama model creation workflow.
- Escalation through an oversized tensor shape that forces an out-of-bounds heap read during conversion.
- Impact when the leaked heap is written back into a model artifact and pushed to an attacker-controlled server.
Breaches seen in the wild
- 230M AWS environment compromise — 230M AWS environments compromised via exposed .env files with cloud credentials.
- Shai Hulud npm malware campaign — Shai Hulud campaign: npm malware exposed secrets on GitHub.
Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.
NHI Mgmt Group analysis
Unauthenticated AI runtimes create identityless trust zones: when a service can accept crafted model files and expose process memory without authentication, the control problem is no longer just application security. It becomes NHI governance because the runtime may carry secrets, prompts, and tool outputs that should never be treated as disposable application state. Practitioners should classify these services as privileged workloads with explicit authentication, authorization, and audit requirements.
Prompt leakage is a secrets-management failure, not only a privacy issue: the exposed memory can contain operational variables, internal instructions, and credentials that support other systems. That means a model server can become a repository for high-value NHI material even when it was never intended to store secrets. Teams should map what enters AI runtimes and remove anything that would be unacceptable in a log file or core dump.
Model import paths deserve the same scrutiny as code deployment paths: the exploit chain starts with a file parser, but the risk extends to how the platform converts, stores, and exports artifacts. The named concept here is model conversion disclosure debt: accumulated risk created when low-level data handling in AI platforms is not bounded like a production identity workflow. Practitioners should test every import, transform, and export path as if it were handling privileged data.
Local deployment does not reduce governance obligations: the article shows that running models on-premises or on self-managed infrastructure does not eliminate exposure, it merely relocates it. If the service is reachable on all interfaces and holds sensitive context in memory, it becomes an enterprise-grade NHI target. The control baseline should include authentication, segmentation, memory hygiene, and explicit secret exclusion.
The market lesson is that AI stack security is converging with workload identity security: the more AI systems process real business data, the more they need the same controls applied to service accounts, secrets, and privileged automation. Practitioners should stop treating AI runtimes as isolated experimentation tools and govern them as durable production identities with attack paths.
From our research:
- The average time to mitigate a leaked secret is 36 hours, highlighting the operational burden of manual remediation processes, according to The 2024 State of Secrets Management Survey.
- Only 44% of organisations are currently using a dedicated secrets management system, which leaves many AI and NHI workflows exposed to ad hoc handling.
- That control gap is why teams should pair runtime hardening with lifecycle controls in NHI Lifecycle Management Guide and test exposed-secret response before an incident forces the issue.
What this signals
Model conversion disclosure debt: AI platforms accumulate hidden risk when they transform untrusted model files into trusted runtime artifacts without the same controls used for privileged identities. That debt shows up later as leaked prompts, exposed environment variables, and unexpected export paths, so practitioners should inventory every import and conversion boundary before expanding production use.
Cyera’s findings fit a broader pattern that NHI teams already see in service-account governance. When a runtime can ingest sensitive data, retain it in memory, and pass it to other tools, the control problem moves beyond application security into access design, secret handling, and incident response.
The programme implication is straightforward: AI runtime controls should be measured like any other privileged workload, with explicit ownership, rotation expectations, and recovery playbooks. Teams that already align with the NHI Lifecycle Management Guide will have a better starting point for containing the blast radius of leaked model memory.
For practitioners
- Restrict unauthenticated access to AI runtimes Place Ollama and similar services behind authenticated network boundaries, segment them from general user access, and deny exposure on all interfaces by default.
- Review model import and conversion paths Test file parsing, tensor shape validation, and export routines with malformed inputs to confirm that out-of-bounds reads cannot reach process memory or persisted artifacts.
- Remove secrets from AI runtime memory scope Keep environment variables, API keys, and internal instructions out of the inference process wherever possible, and treat any in-memory secret as recoverable until proven otherwise.
- Treat push and pull workflows as trust boundaries Constrain outbound model publishing, validate destination URIs, and ensure that model names cannot be repurposed as exfiltration channels during normal operations.
Key takeaways
- Unauthenticated AI runtimes can turn model conversion into a process-memory disclosure path that exposes prompts and secrets.
- The scale of the risk is operational, not theoretical, because leaked memory can include environment variables and data from concurrent sessions.
- Teams should harden AI runtimes like privileged NHI workloads, with authentication, validation, segmentation, and rapid secret rotation.
Standards & Framework Alignment
This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.
OWASP Agentic AI Top 10 and OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 set the governance and control requirements practitioners need to meet.
| Framework | Control / Reference | Relevance |
|---|---|---|
| OWASP Agentic AI Top 10 | NHI-03 | File parsing and runtime memory handling expose AI agent context and secrets. |
| OWASP Non-Human Identity Top 10 | NHI-06 | The leak exposes secrets and operational data tied to a non-human workload. |
| NIST CSF 2.0 | PR.AC-4 | Unauthenticated access to the service weakens access control and segmentation. |
Inventory AI runtime secrets and revoke anything that can be recovered from memory.
Key terms
- Model Conversion Disclosure Debt: The accumulated risk created when AI platforms transform untrusted model files into trusted runtime artifacts without sufficient validation or containment. In practice, it appears when parsing, conversion, or export steps can expose memory, secrets, or prompts that were never meant to leave the process boundary.
- Prompt Leakage: The unintended exposure of user prompts, system prompts, or tool output from an AI runtime. In NHI terms, prompt leakage matters because those strings often carry sensitive instructions, credentials, or business context, and they may be stored in memory, logs, or exported artifacts.
- Non-Human Identity Runtime: A production service that executes with its own credentials, tool access, and operational trust, even when it is not a traditional workload account. For AI systems, this includes the model server, its API surface, and any attached tools or secret material it can reach.
Deepen your knowledge
AI runtime hardening and NHI secret exposure are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are securing self-hosted model servers or agentic workflows, the course provides a practical starting point.
This post draws on content published by Cyera: Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama. Read the original.
Published by the NHIMG editorial team on 2026-05-05.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org