CVE-2025-52566
Unknown Unknown - Not Provided
BaseFortify

Publication date: 2025-06-24

Last updated on: 2025-08-27

Assigner: GitHub, Inc.

Description
llama.cpp is an inference of several LLM models in C/C++. Prior to version b5721, there is a signed vs. unsigned integer overflow in llama.cpp's tokenizer implementation (llama_vocab::tokenize) (src/llama-vocab.cpp:3036) resulting in unintended behavior in tokens copying size comparison. Allowing heap-overflowing llama.cpp inferencing engine with carefully manipulated text input during tokenization process. This issue has been patched in version b5721.
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2025-06-24
Last Modified
2025-08-27
Generated
2026-05-06
AI Q&A
2025-06-24
EPSS Evaluated
2026-05-05
NVD
EUVD
Affected Vendors & Products
Showing 1 associated CPE
Vendor Product Version / Range
ggml llama.cpp to b5721 (exc)
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-119 The product performs operations on a memory buffer, but it reads from or writes to a memory location outside the buffer's intended boundary. This may result in read or write operations on unexpected memory locations that could be linked to other variables, data structures, or internal program data.
CWE-195 The product uses a signed primitive and performs a cast to an unsigned primitive, which can produce an unexpected value if the value of the signed primitive can not be represented using an unsigned primitive.
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?

CVE-2025-52566 is a vulnerability in the llama.cpp tokenizer implementation caused by a signed versus unsigned integer overflow during token size comparison. Specifically, when the number of tokens produced exceeds the maximum value of a 32-bit signed integer (INT_MAX), the size cast to a signed integer becomes negative, causing incorrect checks and leading to out-of-bounds writes on a heap-allocated buffer. This results in a heap overflow during token copying. The vulnerability can be triggered by carefully crafted large text inputs, especially when using the chat template system with Jinja support, which bypasses normal size checks. The issue has been patched by adding explicit overflow detection and error handling to prevent such overflows. [1, 2]


How can this vulnerability impact me? :

This vulnerability can lead to heap overflow, which may corrupt adjacent heap memory and enable remote code execution (RCE) or denial-of-service (DoS) attacks by hijacking the execution flow or crashing the application. Exploiting this requires local access and user interaction but has low attack complexity. The vulnerability affects confidentiality, integrity, and availability of the system running llama.cpp, making it a high-severity security risk. [2]


How can this vulnerability be detected on my network or system? Can you suggest some commands?

Detection involves monitoring for abnormal tokenization behavior or crashes in llama.cpp applications, especially when processing large inputs. Since the patched version returns INT32_MIN and throws a runtime exception with the message "Tokenization failed: input text too large, tokenization result exceeds int32_t limit," you can detect the vulnerability by checking logs or error outputs for this exception message. Additionally, running the application with AddressSanitizer (ASAN) enabled can help detect heap overflows during tokenization. There are no specific network commands provided, but monitoring llama.cpp logs for the mentioned exception or crashes during tokenization of large inputs is recommended. [1, 2]


What immediate steps should I take to mitigate this vulnerability?

The immediate mitigation step is to upgrade llama.cpp to version b5721 or later, where the vulnerability has been patched by adding explicit overflow detection and exception handling in the tokenizer. Avoid processing extremely large inputs that could exceed INT32_MAX tokens. If upgrading is not immediately possible, consider disabling or restricting the use of Jinja-based chat templates that can bypass vector size limits and avoid using the BPE tokenizer mode (LLAMA_VOCAB_TYPE_BPE) to prevent collateral stack overflow issues. Monitoring and logging tokenization errors can also help in early detection of exploitation attempts. [1, 2]


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart