CVE-2026-54235

Analyzed Analyzed - Analysis Complete

Float Handling Flaw in vLLM Leads to GPU Sampling Errors

Publication date: 2026-06-22

Last updated on: 2026-06-24

Assigner: GitHub, Inc.

Description

vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.23.1rc0, ll temperature validation gates use comparison operators (<, >), which silently evaluate to False for NaN and for positive Infinity in Python's IEEE 754 float semantics. Both values pass every guard and propagate to GPU sampling kernels, where they produce undefined behavior or CUDA errors that can crash the inference worker. This vulnerability is fixed in 0.23.1rc0.

CVSS Scores

EPSS Scores

Probability:
Percentile:

Meta Information

Published

2026-06-22

Last Modified

2026-06-24

Generated

2026-08-02

AI Q&A

2026-06-23

EPSS Evaluated

2026-08-01

NVD

CVE-2026-54235

Affected Vendors & Products

Vendor	Product	Version / Range
vllm	vllm	to 0.23.1 (exc)

Helpful Resources

Exploitability

CWE

KEV

CWE ID	Description
CWE-1287	The product receives input that is expected to be of a certain type, but it does not validate or incorrectly validates that the input is actually of the expected type.

Attack-Flow Graph

Executive Summary

This vulnerability exists in vLLM, an inference and serving engine for large language models (LLMs), in versions prior to 0.23.1rc0. The issue arises because the temperature validation gates use comparison operators (<, >) that silently evaluate to False for special floating-point values like NaN (Not a Number) and positive Infinity according to Python's IEEE 754 float semantics.

As a result, these special values bypass the validation checks and propagate to GPU sampling kernels. This leads to undefined behavior or CUDA errors that can crash the inference worker.

The vulnerability is fixed in version 0.23.1rc0.

Detection Guidance

This vulnerability involves improper validation of temperature and repetition_penalty parameters in the vLLM inference engine, allowing non-finite float values (NaN, Infinity) to bypass checks and cause crashes. Detection involves verifying if the system is running a vulnerable version of vLLM (up to 0.8.5) and checking if non-finite values are being passed to these parameters.

Since the issue is in the validation of input parameters within the vLLM application, direct network or system commands to detect exploitation are not explicitly provided in the resources.

However, to detect if the vulnerable code is present, you can check the installed version of vLLM:

pip show vllm

If the version is older than 0.23.1rc0 (or up to 0.8.5 as per Resource 1), it is vulnerable.

To detect if non-finite values are being used in your inference requests, you can audit logs or add debugging to monitor the temperature and repetition_penalty parameters for NaN or Infinity values.

In Python, you can run a simple check to detect non-finite values in your input parameters:

import math
def is_valid(value):
return math.isfinite(value)
print(is_valid(float('nan'))) # False
print(is_valid(float('inf'))) # False

This kind of validation can be added to your input sanitization or monitoring scripts.

Impact Analysis

The impact of this vulnerability is that it can cause the inference worker in vLLM to crash due to undefined behavior or CUDA errors when processing NaN or positive Infinity temperature values.

This can lead to denial of service conditions where the large language model inference engine becomes unavailable or unstable.

Compliance Impact

The vulnerability causes service degradation due to inference worker crashes when non-finite float values bypass validation and cause undefined behavior or CUDA errors. This can impact availability and reliability of services using the vLLM package.

However, there is no information provided in the available resources about direct effects on compliance with common standards and regulations such as GDPR or HIPAA.

Mitigation Strategies

To mitigate this vulnerability, upgrade vLLM to version 0.23.1rc0 or later, where the temperature validation gates have been fixed to properly handle NaN and positive Infinity values, preventing undefined behavior and CUDA errors.

Hi! I’m here to help you understand CVE-2026-54235. Ask me anything about the vulnerability, its impact, or mitigation strategies.

0/70