CVE-2026-53923
Received
Received - Intake
Integer Truncation in vLLM GGUF Dequantize Kernels Leads to Information Disclosure
Publication date: 2026-06-22
Last updated on: 2026-06-22
Assigner: GitHub, Inc.
Description
Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
CVSS Scores
EPSS Scores
| Probability: | |
| Percentile: |
Meta Information
Affected Vendors & Products
Currently, no data is known.
Helpful Resources
Exploitability
| CWE ID | Description |
|---|---|
| CWE-681 | When converting from one data type to another, such as long to integer, data can be omitted or translated in a way that produces unexpected values. If the resulting values are used in a sensitive context, then dangerous behaviors may occur. |
| CWE-200 | The product exposes sensitive information to an actor that is not explicitly authorized to have access to that information. |