CVE-2026-33298
Heap-Based Buffer Overflow in llama.cpp Allows Remote Code Execution
Publication date: 2026-03-24
Last updated on: 2026-04-30
Assigner: GitHub, Inc.
Description
Description
CVSS Scores
EPSS Scores
| Probability: | |
| Percentile: |
Meta Information
Affected Vendors & Products
| Vendor | Product | Version / Range |
|---|---|---|
| ggml | llama.cpp | to b7824 (exc) |
Helpful Resources
Exploitability
| CWE ID | Description |
|---|---|
| CWE-122 | A heap overflow condition is a buffer overflow, where the buffer that can be overwritten is allocated in the heap portion of memory, generally meaning that the buffer was allocated using a routine such as malloc(). |
| CWE-190 | The product performs a calculation that can produce an integer overflow or wraparound when the logic assumes that the resulting value will always be larger than the original value. This occurs when an integer value is incremented to a value that is too large to store in the associated representation. When this occurs, the value may become a very small or negative number. |
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?
This vulnerability exists in the llama.cpp project, which is a C/C++ implementation for running large language models. The flaw is an integer overflow in the function `ggml_nbytes` that calculates the memory size needed for a tensor. By crafting a specially designed GGUF file with specific tensor dimensions, an attacker can cause this function to underestimate the required memory size significantly.
Because the size is underestimated (for example, 4MB instead of exabytes), the program allocates a smaller buffer than needed. When the tensor is processed, this leads to a heap-based buffer overflow, which can corrupt memory.
This memory corruption can be exploited to achieve remote code execution (RCE) or cause denial of service (DoS). The vulnerability was fixed in commit b7824 by adding proper checks to prevent integer overflow and validate tensor sizes.
How can this vulnerability impact me? :
If you use llama.cpp to load GGUF model files, an attacker could craft a malicious GGUF file that triggers this vulnerability.
- Heap-based buffer overflow leading to memory corruption.
- Potential remote code execution (RCE), allowing an attacker to run arbitrary code on your system.
- Denial of service (DoS) by crashing the application.
The vulnerability requires local access to load the malicious file and some user interaction, but no special privileges are needed.
How does this vulnerability affect compliance with common standards and regulations (like GDPR, HIPAA)?:
I don't know
How can this vulnerability be detected on my network or system? Can you suggest some commands?
This vulnerability arises from processing malicious GGUF files with crafted tensor dimensions that cause an integer overflow in the ggml_nbytes function. Detection involves identifying usage of vulnerable versions of llama.cpp prior to commit b7824 and monitoring for suspicious GGUF files with abnormally large tensor dimensions.
Since the vulnerability is triggered by loading malicious GGUF files, you can detect attempts by scanning for GGUF files with suspiciously large tensor dimensions or by verifying the version of llama.cpp in use.
Suggested commands include:
- Check the version of llama.cpp in your environment to ensure it is at or beyond commit b7824.
- Use file inspection or custom scripts to analyze GGUF files for tensor dimensions that could cause integer overflow, for example, dimensions with very large values like those exceeding 2^40.
- Monitor application logs for crashes or memory corruption errors when loading GGUF files.
What immediate steps should I take to mitigate this vulnerability?
The primary mitigation is to update llama.cpp to the fixed version b7824 or later, which includes checks to prevent integer overflow in tensor size calculations.
Additional immediate steps include:
- Avoid loading untrusted or unaudited GGUF files that could contain malicious tensor dimensions.
- Implement strict input validation on GGUF files to ensure tensor sizes are within safe and representable limits before processing.
- If updating is not immediately possible, consider sandboxing or restricting the execution environment of applications using llama.cpp to limit potential damage from exploitation.