CVE-2026-33298

Received Received - Intake

Heap-Based Buffer Overflow in llama.cpp Allows Remote Code Execution

Publication date: 2026-03-24

Last updated on: 2026-04-30

Assigner: GitHub, Inc.

Description

llama.cpp is an inference of several LLM models in C/C++. Prior to b7824, an integer overflow vulnerability in the `ggml_nbytes` function allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions. This causes `ggml_nbytes` to return a significantly smaller size than required (e.g., 4MB instead of Exabytes), leading to a heap-based buffer overflow when the application subsequently processes the tensor. This vulnerability allows potential Remote Code Execution (RCE) via memory corruption. b7824 contains a fix.

CVSS Scores

EPSS Scores

Probability:
Percentile:

Meta Information

Published

2026-03-24

Last Modified

2026-04-30

Generated

2026-05-27

AI Q&A

2026-03-24

EPSS Evaluated

2026-05-26

NVD

CVE-2026-33298

EUVD

EUVD-2026-14668

Affected Vendors & Products

Vendor	Product	Version / Range
ggml	llama.cpp	to b7824 (exc)

Helpful Resources

Exploitability

CWE

KEV

CWE ID	Description
CWE-190	The product performs a calculation that can produce an integer overflow or wraparound when the logic assumes that the resulting value will always be larger than the original value. This occurs when an integer value is incremented to a value that is too large to store in the associated representation. When this occurs, the value may become a very small or negative number.
CWE-122	A heap overflow condition is a buffer overflow, where the buffer that can be overwritten is allocated in the heap portion of memory, generally meaning that the buffer was allocated using a routine such as malloc().

Attack-Flow Graph

AI Powered Q&A

How can this vulnerability be detected on my network or system? Can you suggest some commands?

This vulnerability arises from processing malicious GGUF files with crafted tensor dimensions that cause an integer overflow in the ggml_nbytes function. Detection involves identifying usage of vulnerable versions of llama.cpp prior to commit b7824 and monitoring for suspicious GGUF files with abnormally large tensor dimensions.

Since the vulnerability is triggered by loading malicious GGUF files, you can detect attempts by scanning for GGUF files with suspiciously large tensor dimensions or by verifying the version of llama.cpp in use.

Suggested commands include:

Check the version of llama.cpp in your environment to ensure it is at or beyond commit b7824.
Use file inspection or custom scripts to analyze GGUF files for tensor dimensions that could cause integer overflow, for example, dimensions with very large values like those exceeding 2^40.
Monitor application logs for crashes or memory corruption errors when loading GGUF files.

What immediate steps should I take to mitigate this vulnerability?

The primary mitigation is to update llama.cpp to the fixed version b7824 or later, which includes checks to prevent integer overflow in tensor size calculations.

Additional immediate steps include:

Avoid loading untrusted or unaudited GGUF files that could contain malicious tensor dimensions.
Implement strict input validation on GGUF files to ensure tensor sizes are within safe and representable limits before processing.
If updating is not immediately possible, consider sandboxing or restricting the execution environment of applications using llama.cpp to limit potential damage from exploitation.

Can you explain this vulnerability to me?

This vulnerability exists in the llama.cpp project, which is a C/C++ implementation for running large language models. The flaw is an integer overflow in the function `ggml_nbytes` that calculates the memory size needed for a tensor. By crafting a specially designed GGUF file with specific tensor dimensions, an attacker can cause this function to underestimate the required memory size significantly.

Because the size is underestimated (for example, 4MB instead of exabytes), the program allocates a smaller buffer than needed. When the tensor is processed, this leads to a heap-based buffer overflow, which can corrupt memory.

This memory corruption can be exploited to achieve remote code execution (RCE) or cause denial of service (DoS). The vulnerability was fixed in commit b7824 by adding proper checks to prevent integer overflow and validate tensor sizes.

How can this vulnerability impact me? :

If you use llama.cpp to load GGUF model files, an attacker could craft a malicious GGUF file that triggers this vulnerability.

Heap-based buffer overflow leading to memory corruption.
Potential remote code execution (RCE), allowing an attacker to run arbitrary code on your system.
Denial of service (DoS) by crashing the application.

The vulnerability requires local access to load the malicious file and some user interaction, but no special privileges are needed.

How does this vulnerability affect compliance with common standards and regulations (like GDPR, HIPAA)?:

I don't know

Ask Our AI Assistant

Need more information? Ask your question to get an AI reply (Powered by our expertise)

0/70

CVE-2026-33298

Heap-Based Buffer Overflow in llama.cpp Allows Remote Code Execution

Description

CVSS Scores

EPSS Scores

Meta Information

Affected Vendors & Products

Helpful Resources

Exploitability

Attack-Flow Graph

AI Powered Q&A

Ask Our AI Assistant

EPSS Chart