CVE-2025-52566

Unknown Unknown - Not Provided

BaseFortify

Publication date: 2025-06-24

Last updated on: 2025-08-27

Assigner: GitHub, Inc.

Description

llama.cpp is an inference of several LLM models in C/C++. Prior to version b5721, there is a signed vs. unsigned integer overflow in llama.cpp's tokenizer implementation (llama_vocab::tokenize) (src/llama-vocab.cpp:3036) resulting in unintended behavior in tokens copying size comparison. Allowing heap-overflowing llama.cpp inferencing engine with carefully manipulated text input during tokenization process. This issue has been patched in version b5721.

CVSS Scores

EPSS Scores

Probability:
Percentile:

Meta Information

Published

2025-06-24

Last Modified

2025-08-27

Generated

2026-05-06

AI Q&A

2025-06-24

EPSS Evaluated

2026-05-05

NVD

CVE-2025-52566

EUVD

EUVD-2025-19074

Affected Vendors & Products

Vendor	Product	Version / Range
ggml	llama.cpp	to b5721 (exc)

Helpful Resources

Exploitability

CWE

KEV

CWE ID	Description
CWE-119	The product performs operations on a memory buffer, but it reads from or writes to a memory location outside the buffer's intended boundary. This may result in read or write operations on unexpected memory locations that could be linked to other variables, data structures, or internal program data.
CWE-195	The product uses a signed primitive and performs a cast to an unsigned primitive, which can produce an unexpected value if the value of the signed primitive can not be represented using an unsigned primitive.

Attack-Flow Graph

AI Powered Q&A

Can you explain this vulnerability to me?

CVE-2025-52566 is a vulnerability in the llama.cpp tokenizer implementation caused by a signed versus unsigned integer overflow during token size comparison. Specifically, when the number of tokens produced exceeds the maximum value of a 32-bit signed integer (INT_MAX), the size cast to a signed integer becomes negative, causing incorrect checks and leading to out-of-bounds writes on a heap-allocated buffer. This results in a heap overflow during token copying. The vulnerability can be triggered by carefully crafted large text inputs, especially when using the chat template system with Jinja support, which bypasses normal size checks. The issue has been patched by adding explicit overflow detection and error handling to prevent such overflows. [1, 2]

How can this vulnerability impact me? :

This vulnerability can lead to heap overflow, which may corrupt adjacent heap memory and enable remote code execution (RCE) or denial-of-service (DoS) attacks by hijacking the execution flow or crashing the application. Exploiting this requires local access and user interaction but has low attack complexity. The vulnerability affects confidentiality, integrity, and availability of the system running llama.cpp, making it a high-severity security risk. [2]

How can this vulnerability be detected on my network or system? Can you suggest some commands?

Detection involves monitoring for abnormal tokenization behavior or crashes in llama.cpp applications, especially when processing large inputs. Since the patched version returns INT32_MIN and throws a runtime exception with the message "Tokenization failed: input text too large, tokenization result exceeds int32_t limit," you can detect the vulnerability by checking logs or error outputs for this exception message. Additionally, running the application with AddressSanitizer (ASAN) enabled can help detect heap overflows during tokenization. There are no specific network commands provided, but monitoring llama.cpp logs for the mentioned exception or crashes during tokenization of large inputs is recommended. [1, 2]

What immediate steps should I take to mitigate this vulnerability?

The immediate mitigation step is to upgrade llama.cpp to version b5721 or later, where the vulnerability has been patched by adding explicit overflow detection and exception handling in the tokenizer. Avoid processing extremely large inputs that could exceed INT32_MAX tokens. If upgrading is not immediately possible, consider disabling or restricting the use of Jinja-based chat templates that can bypass vector size limits and avoid using the BPE tokenizer mode (LLAMA_VOCAB_TYPE_BPE) to prevent collateral stack overflow issues. Monitoring and logging tokenization errors can also help in early detection of exploitation attempts. [1, 2]

Ask Our AI Assistant

Need more information? Ask your question to get an AI reply (Powered by our expertise)

0/70

CVE-2025-52566

BaseFortify

Description

CVSS Scores

EPSS Scores

Meta Information

Affected Vendors & Products

Helpful Resources

Exploitability

Attack-Flow Graph

AI Powered Q&A

Ask Our AI Assistant

EPSS Chart