CVE-2025-49847
Unknown Unknown - Not Provided
BaseFortify

Publication date: 2025-06-17

Last updated on: 2025-08-27

Assigner: GitHub, Inc.

Description
llama.cpp is an inference of several LLM models in C/C++. Prior to version b5662, an attacker‐supplied GGUF model vocabulary can trigger a buffer overflow in llama.cpp’s vocabulary‐loading code. Specifically, the helper _try_copy in llama.cpp/src/vocab.cpp: llama_vocab::impl::token_to_piece() casts a very large size_t token length into an int32_t, causing the length check (if (length < (int32_t)size)) to be bypassed. As a result, memcpy is still called with that oversized size, letting a malicious model overwrite memory beyond the intended buffer. This can lead to arbitrary memory corruption and potential code execution. This issue has been patched in version b5662.
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2025-06-17
Last Modified
2025-08-27
Generated
2026-05-07
AI Q&A
2025-06-17
EPSS Evaluated
2026-05-05
NVD
Affected Vendors & Products
Showing 1 associated CPE
Vendor Product Version / Range
ggml llama.cpp to b5662 (exc)
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-119 The product performs operations on a memory buffer, but it reads from or writes to a memory location outside the buffer's intended boundary. This may result in read or write operations on unexpected memory locations that could be linked to other variables, data structures, or internal program data.
CWE-195 The product uses a signed primitive and performs a cast to an unsigned primitive, which can produce an unexpected value if the value of the signed primitive can not be represented using an unsigned primitive.
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?

CVE-2025-49847 is a critical buffer overflow vulnerability in the llama.cpp project. It occurs in the vocabulary-loading code when processing attacker-supplied GGUF model vocabularies. Specifically, the function handling token lengths casts a large unsigned size_t token length into a signed int32_t, causing an integer overflow that bypasses a length check. This allows memcpy to copy more data than the buffer can hold, leading to memory corruption and potentially arbitrary code execution. [2]


How can this vulnerability impact me? :

This vulnerability can lead to arbitrary memory corruption, application crashes (denial of service), and potentially remote code execution by overwriting heap metadata or control flow pointers. Any application using llama.cpp to load GGUF models from untrusted sources, such as inference servers or chatbots, is at risk. An attacker can exploit this by supplying a maliciously crafted model vocabulary, requiring only user interaction and no privileges. [2]


How can this vulnerability be detected on my network or system? Can you suggest some commands?

Detection of this vulnerability involves monitoring for the loading of malicious GGUF model vocabularies that contain oversized token lengths causing buffer overflow. Since the vulnerability occurs during the loading of GGUF models in llama.cpp, you can detect attempts by checking for crashes or abnormal behavior in applications using llama.cpp when loading models. Additionally, scanning for the version of llama.cpp in use can help identify vulnerable instances (versions prior to b5662). Specific commands are not provided in the resources, but general approaches include: 1) Checking application logs for crashes or memory corruption during model loading. 2) Using network monitoring tools to detect suspicious GGUF model file transfers. 3) Verifying the llama.cpp version with commands like `llama.cpp --version` or inspecting the deployed software version. 4) Employing fuzz testing or custom scripts to load GGUF models with oversized token lengths to trigger the vulnerability in a controlled environment. [2]


What immediate steps should I take to mitigate this vulnerability?

Immediate mitigation steps include updating llama.cpp to version b5662 or later, where the vulnerability has been patched by correcting the length check to prevent integer overflow and buffer overflow. If updating is not immediately possible, avoid loading GGUF models from untrusted or unverified sources to prevent exploitation. Additionally, monitor applications for crashes or abnormal behavior related to model loading and consider applying any available patches or workarounds from the llama.cpp project repository. [1, 2]


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart