CVE-2026-7141
Received Received - Intake
Uninitialized Resource Vulnerability in vllm KV Block Handler

Publication date: 2026-04-27

Last updated on: 2026-05-01

Assigner: VulDB

Description
A vulnerability was found in vllm up to 0.19.0. The affected element is the function has_mamba_layers of the file vllm/v1/kv_cache_interface.py of the component KV Block Handler. Performing a manipulation results in uninitialized resource. It is possible to initiate the attack remotely. The attack is considered to have high complexity. The exploitability is described as difficult. The exploit has been made public and could be used. The patch is named 1ad67864c0c20f167929e64c875f5c28e1aad9fd. To fix this issue, it is recommended to deploy a patch.
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2026-04-27
Last Modified
2026-05-01
Generated
2026-05-06
AI Q&A
2026-04-27
EPSS Evaluated
2026-05-05
NVD
EUVD
Affected Vendors & Products
Showing 1 associated CPE
Vendor Product Version / Range
vllm vllm to 0.19.0 (inc)
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-908 The product uses or accesses a resource that has not been initialized.
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?

CVE-2026-7141 is a vulnerability in the vLLM project affecting the KV (key-value) block handler, specifically in the function has_mamba_layers of the file vllm/v1/kv_cache_interface.py. The issue arises from improper management of KV cache blocks in the base scheduler, where recycled KV blocks are reused without clearing stale GPU memory. This leads to corrupted outputs and non-deterministic generation results when prefix caching is disabled.

The bug manifests as identical prompts producing different output sequences across runs due to stale KV data being reused. It occurs under concurrency with multiple simultaneous requests and is related to how KV blocks are returned to the free pool and reused without zeroing. This can cause NaN or Inf values to propagate in model operations, compromising correctness and stability.

The vulnerability is exploitable remotely but with high complexity and difficult exploitability. A patch has been released to fix the issue by ensuring that recycled KV cache blocks are zeroed before reuse, especially for FullAttention models and those with Mamba layers.


How can this vulnerability impact me? :

This vulnerability can impact users by causing non-deterministic and corrupted outputs from the vLLM model scheduler, which can undermine the reliability and correctness of AI model results.

In practical terms, if you rely on vLLM for AI model inference, especially in concurrent environments without prefix caching enabled, you may experience inconsistent or incorrect outputs due to stale KV cache data being reused.

Additionally, the presence of stale data can lead to propagation of invalid values (NaN or Inf) during model computations, potentially causing crashes or instability in applications using the affected models.

Because the exploit is public, attackers could potentially trigger these corrupted outputs remotely, although the attack complexity is high and exploitability is difficult.


How can this vulnerability be detected on my network or system? Can you suggest some commands?

This vulnerability manifests as non-deterministic output from the vLLM base scheduler when prefix caching is disabled. Identical prompts produce different output sequences across runs, especially at temperature=0.

Detection involves reproducing the issue by running vLLM with a specified model and GPU memory utilization, then executing a repro script with JSON trace files that demonstrate output divergence patterns.

Specifically, fuzz testing was used to discover the bug, and it is reproducible 10/10 times across multiple independent traces without speculative decoding enabled.

Commands to detect the vulnerability would include running vLLM with prefix caching disabled (not using --enable-prefix-caching), setting temperature=0, and using the repro.py script with provided JSON traces to observe output divergence.


What immediate steps should I take to mitigate this vulnerability?

The recommended mitigation is to deploy the patch identified by commit 1ad67864c0c20f167929e64c875f5c28e1aad9fd.

This patch modifies the handling of recycled KV cache blocks to ensure they are zeroed before reuse, preventing stale key/value data leakage.

Specifically, the patch updates the property controlling KV cache zeroing to include FullAttention models and their subclasses, ensuring proper clearing of KV cache blocks.

Until the patch is applied, avoid running vLLM with prefix caching disabled and multiple concurrent requests that increase memory pressure, as these conditions exacerbate the issue.


How does this vulnerability affect compliance with common standards and regulations (like GDPR, HIPAA)?:

The provided information does not explicitly describe how the vulnerability CVE-2026-7141 affects compliance with common standards and regulations such as GDPR or HIPAA.


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart