CVE-2026-53923
Received Received - Intake
Integer Truncation in vLLM GGUF Dequantize Kernels Leads to Information Disclosure

Publication date: 2026-06-22

Last updated on: 2026-06-22

Assigner: GitHub, Inc.

Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2026-06-22
Last Modified
2026-06-22
Generated
2026-06-23
AI Q&A
2026-06-23
EPSS Evaluated
N/A
NVD
Affected Vendors & Products
Currently, no data is known.
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-681 When converting from one data type to another, such as long to integer, data can be omitted or translated in a way that produces unexpected values. If the resulting values are used in a sensitive context, then dangerous behaviors may occur.
CWE-200 The product exposes sensitive information to an actor that is not explicitly authorized to have access to that information.
Attack-Flow Graph
AI Quick Actions
Instant insights powered by AI
Executive Summary

This vulnerability exists in vLLM, an inference and serving engine for large language models. Due to integer truncation in the GGUF dequantize kernels, only part of the output tensor is processed, while the rest remains uninitialized and retains residual data from GPU memory.

In multi-tenant inference deployments, this means that leftover data from other users' inference requests can be exposed, leading to information disclosure.

The issue affects versions from 0.5.5 until 0.23.1rc0 and is fixed in version 0.23.1rc0.

Impact Analysis

The vulnerability can lead to information disclosure in multi-tenant environments where multiple users share the same GPU resources.

Specifically, residual GPU memory from one user's inference request may be exposed to another user, potentially leaking sensitive or confidential data.

Mitigation Strategies

To mitigate this vulnerability, upgrade vLLM to version 0.23.1rc0 or later, where the integer truncation issue in the GGUF dequantize kernels has been fixed.

Compliance Impact

The vulnerability in vLLM causes partial tensor processing that may lead to residual GPU memory containing tensor data from other users' inference requests. In multi-tenant inference deployments, this residual data exposure constitutes information disclosure.

Such information disclosure can impact compliance with data protection regulations like GDPR and HIPAA, which require strict controls to prevent unauthorized access to personal or sensitive data.

Therefore, this vulnerability could lead to violations of these standards by exposing data from one user to another without authorization.

Chat Assistant
Ask questions about this CVE
Hi! I’m here to help you understand CVE-2026-53923. Ask me anything about the vulnerability, its impact, or mitigation strategies.
0/70
EPSS Chart