CVE-2026-34756
Received Received - Intake
Denial of Service in vLLM API via Unbounded Parameter Abuse

Publication date: 2026-04-06

Last updated on: 2026-04-20

Assigner: GitHub, Inc.

Description
vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest and CompletionRequest Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large n value. This completely blocks the Python asyncio event loop and causes immediate Out-Of-Memory crashes by allocating millions of request object copies in the heap before the request even reaches the scheduling queue. This vulnerability is fixed in 0.19.0.
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2026-04-06
Last Modified
2026-04-20
Generated
2026-05-07
AI Q&A
2026-04-06
EPSS Evaluated
2026-05-05
NVD
EUVD
Affected Vendors & Products
Showing 1 associated CPE
Vendor Product Version / Range
vllm vllm From 0.1.0 (inc) to 0.19.0 (exc)
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-770 The product allocates a reusable resource or group of resources on behalf of an actor without imposing any intended restrictions on the size or number of resources that can be allocated.
Attack-Flow Graph
AI Powered Q&A
How does this vulnerability affect compliance with common standards and regulations (like GDPR, HIPAA)?:

The vulnerability described in CVE-2026-34756 is a Denial of Service (DoS) issue that impacts the availability of the vLLM OpenAI-compatible API server by allowing unauthenticated attackers to exhaust system resources. It does not affect confidentiality or integrity of data.

Since the vulnerability does not involve unauthorized access to or disclosure of personal or sensitive data, it does not directly impact compliance with data protection regulations such as GDPR or HIPAA, which primarily focus on confidentiality and privacy of personal information.

However, the availability impact caused by this vulnerability could indirectly affect compliance if the affected service is critical for meeting regulatory requirements for uptime or service continuity.


Can you explain this vulnerability to me?

CVE-2026-34756 is a Denial of Service (DoS) vulnerability in the vLLM OpenAI-compatible API server caused by the lack of an upper bound on the 'n' parameter in the ChatCompletionRequest and CompletionRequest models.

An unauthenticated attacker can send a single HTTP request with an extremely large 'n' value, which causes the server to create millions of copies of the request object. This monopolizes the Python asyncio event loop and exhausts memory rapidly, leading to immediate Out-Of-Memory crashes and termination of the vLLM process.

The vulnerability arises because the 'n' parameter, which specifies how many output sequences to generate, has no enforced upper limit, allowing resource exhaustion attacks.

This issue was fixed in vLLM version 0.19.0 by adding upper bound validation on the 'n' parameter to prevent excessive resource allocation.


How can this vulnerability impact me? :

This vulnerability allows any unauthenticated remote attacker to cause a complete denial of service on a public-facing vLLM API server with a single HTTP request.

The attack blocks the Python asyncio event loop by monopolizing CPU resources and causes rapid memory exhaustion by allocating millions of request copies, leading to immediate Out-Of-Memory crashes.

As a result, the vLLM service becomes unavailable, causing complete service disruption and impacting availability.

This can affect SaaS and AI-as-a-Service platforms using vLLM without strict payload validation or rate limiting, bypassing hardware capacity planning and bandwidth limitations.


How can this vulnerability be detected on my network or system? Can you suggest some commands?

This vulnerability can be detected by monitoring for unusually large values of the `n` parameter in HTTP requests sent to the vLLM OpenAI-compatible API endpoints, specifically `/v1/completions` and `/v1/chat/completions`.

Detection involves inspecting incoming API requests for the `n` parameter and identifying requests where `n` is set to an astronomically large number, which can cause resource exhaustion.

You can use network monitoring tools or HTTP request logging to capture and analyze these requests.

  • Use command-line tools like `tcpdump` or `ngrep` to capture HTTP traffic to the vLLM server and filter for requests containing the `n` parameter.
  • Example command to capture HTTP POST requests to the vLLM API (assuming default port 80): `tcpdump -A -s 0 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' | grep -i 'n='`
  • Alternatively, use `ngrep` to filter HTTP requests containing `n=`: `ngrep -W byline 'n=' tcp and port 80`

Additionally, monitor system metrics such as CPU usage, memory consumption, and process crashes (OOM kills) on the vLLM server, as the vulnerability causes high CPU blocking and rapid memory exhaustion.

Logs from the vLLM server or system logs indicating Out-Of-Memory (OOM) kills or crashes can also be indicators of exploitation attempts.


What immediate steps should I take to mitigate this vulnerability?

The primary immediate mitigation is to upgrade the vLLM software to version 0.19.0 or later, where the vulnerability is fixed by adding an upper bound validation on the `n` parameter.

If upgrading immediately is not possible, configure the environment variable `VLLM_MAX_N_SEQUENCES` to set a strict upper limit on the `n` parameter to prevent excessively large requests.

  • Set `VLLM_MAX_N_SEQUENCES` to a conservative value such as 64 or 128 for public-facing deployments to reduce the risk of resource exhaustion.
  • Reject or block requests where the `n` parameter exceeds this configured limit.

Implement additional protections such as reverse proxy layers to enforce request body validation and rate limiting to mitigate abusive payloads.

Monitor resource consumption per request to detect and respond to anomalous or abusive patterns early.


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart