CVE-2025-52566
BaseFortify
Publication date: 2025-06-24
Last updated on: 2025-08-27
Assigner: GitHub, Inc.
Description
Description
CVSS Scores
EPSS Scores
| Probability: | |
| Percentile: |
Meta Information
Affected Vendors & Products
| Vendor | Product | Version / Range |
|---|---|---|
| ggml | llama.cpp | to b5721 (exc) |
Helpful Resources
Exploitability
| CWE ID | Description |
|---|---|
| CWE-119 | The product performs operations on a memory buffer, but it reads from or writes to a memory location outside the buffer's intended boundary. This may result in read or write operations on unexpected memory locations that could be linked to other variables, data structures, or internal program data. |
| CWE-195 | The product uses a signed primitive and performs a cast to an unsigned primitive, which can produce an unexpected value if the value of the signed primitive can not be represented using an unsigned primitive. |
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?
CVE-2025-52566 is a vulnerability in the llama.cpp tokenizer implementation caused by a signed versus unsigned integer overflow during token size comparison. Specifically, when the number of tokens produced exceeds the maximum value of a 32-bit signed integer (INT_MAX), the size cast to a signed integer becomes negative, causing incorrect checks and leading to out-of-bounds writes on a heap-allocated buffer. This results in a heap overflow during token copying. The vulnerability can be triggered by carefully crafted large text inputs, especially when using the chat template system with Jinja support, which bypasses normal size checks. The issue has been patched by adding explicit overflow detection and error handling to prevent such overflows. [1, 2]
How can this vulnerability impact me? :
This vulnerability can lead to heap overflow, which may corrupt adjacent heap memory and enable remote code execution (RCE) or denial-of-service (DoS) attacks by hijacking the execution flow or crashing the application. Exploiting this requires local access and user interaction but has low attack complexity. The vulnerability affects confidentiality, integrity, and availability of the system running llama.cpp, making it a high-severity security risk. [2]
How can this vulnerability be detected on my network or system? Can you suggest some commands?
Detection involves monitoring for abnormal tokenization behavior or crashes in llama.cpp applications, especially when processing large inputs. Since the patched version returns INT32_MIN and throws a runtime exception with the message "Tokenization failed: input text too large, tokenization result exceeds int32_t limit," you can detect the vulnerability by checking logs or error outputs for this exception message. Additionally, running the application with AddressSanitizer (ASAN) enabled can help detect heap overflows during tokenization. There are no specific network commands provided, but monitoring llama.cpp logs for the mentioned exception or crashes during tokenization of large inputs is recommended. [1, 2]
What immediate steps should I take to mitigate this vulnerability?
The immediate mitigation step is to upgrade llama.cpp to version b5721 or later, where the vulnerability has been patched by adding explicit overflow detection and exception handling in the tokenizer. Avoid processing extremely large inputs that could exceed INT32_MAX tokens. If upgrading is not immediately possible, consider disabling or restricting the use of Jinja-based chat templates that can bypass vector size limits and avoid using the BPE tokenizer mode (LLAMA_VOCAB_TYPE_BPE) to prevent collateral stack overflow issues. Monitoring and logging tokenization errors can also help in early detection of exploitation attempts. [1, 2]