CVE-2026-34760
Received Received - Intake
Audio Downmixing Inconsistency in Librosa Affects vLLM Models

Publication date: 2026-04-02

Last updated on: 2026-04-02

Assigner: GitHub, Inc.

Description
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before version 0.18.0, Librosa defaults to using numpy.mean for mono downmixing (to_mono), while the international standard ITU-R BS.775-4 specifies a weighted downmixing algorithm. This discrepancy results in inconsistency between audio heard by humans (e.g., through headphones/regular speakers) and audio processed by AI models (Which infra via Librosa, such as vllm, transformer). This issue has been patched in version 0.18.0.
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2026-04-02
Last Modified
2026-04-02
Generated
2026-05-07
AI Q&A
2026-04-03
EPSS Evaluated
2026-05-05
NVD
EUVD
Affected Vendors & Products
Showing 4 associated CPEs
Vendor Product Version / Range
vllm_project vllm From 0.5.5 (inc) to 0.18.0 (exc)
librosa librosa From 0.5.5 (inc) to 0.18.0 (exc)
vllm_project vllm 0.18.0
pyav pyav *
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-20 The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.
Attack-Flow Graph
AI Powered Q&A
How does this vulnerability affect compliance with common standards and regulations (like GDPR, HIPAA)?:

The vulnerability in CVE-2026-34760 causes inconsistencies between audio heard by humans and audio processed by AI models due to incorrect downmixing implementation. This can lead to bypassing voice authentication, evading content moderation, and disrupting speech recognition accuracy.

While the CVE description and resources detail technical impacts on AI model integrity and security, there is no explicit information on how this vulnerability affects compliance with common standards and regulations such as GDPR or HIPAA.


Can you explain this vulnerability to me?

CVE-2026-34760 is a vulnerability in the vLLM project related to how audio is processed for AI models. The issue arises because the Librosa library, used by vLLM for audio processing, defaults to a simple averaging method (numpy.mean) for converting multichannel audio to mono, instead of using the weighted downmixing algorithm specified by the international standard ITU-R BS.775-4.

This discrepancy causes inconsistencies between the audio heard by humans (through headphones or speakers) and the audio processed by AI models. Specifically, channels like the Low-Frequency Effects (LFE) channel and other surround or height channels, which are typically ignored by consumer playback devices, are included in Librosa's downmixing.

Attackers can exploit this by embedding malicious or interfering audio signals in these ignored channels. While humans hear only the normal front channel audio, AI systems process the full downmixed audio including these hidden signals, potentially causing the AI to behave incorrectly or be compromised.


How can this vulnerability impact me? :

This vulnerability can impact users and systems relying on AI audio processing in several ways:

  • Bypassing voice authentication systems through acceptance of anomalous audio signals hidden in ignored channels.
  • Evading content moderation by embedding prohibited or malicious content in audio channels not heard by humans but processed by AI.
  • Disrupting speech recognition accuracy by masking or altering critical audio features, leading to degraded AI model performance.

Overall, the inconsistency between human-audible audio and AI-processed audio can lead to security and reliability issues in AI-based audio systems.


How can this vulnerability be detected on my network or system? Can you suggest some commands?

This vulnerability arises from the use of the Librosa library's default numpy.mean downmixing method in audio processing within the vLLM project, which differs from the ITU-R BS.775-4 standard. Detection involves identifying if your system or network is using vulnerable versions of vLLM (versions from 0.5.5 up to before 0.18.0) that rely on Librosa for audio downmixing.

To detect the vulnerability, you can check the installed version of vLLM and whether Librosa is used for audio processing. For example, on a system with Python and vLLM installed, you can run commands like:

  • Check vLLM version: `pip show vllm` or `python -c "import vllm; print(vllm.__version__)"`
  • Check if librosa is installed and its version: `pip show librosa`
  • Search your codebase or environment for usage of librosa for audio downmixing, e.g., `grep -r 'librosa' /path/to/your/project`

If you find that your system uses vLLM versions before 0.18.0 with librosa for audio processing, it is vulnerable to this issue.


What immediate steps should I take to mitigate this vulnerability?

The primary mitigation for this vulnerability is to upgrade the vLLM project to version 0.18.0 or later, where the Librosa dependency has been removed and replaced with a more secure audio processing pipeline using pyav.

Additional immediate steps include:

  • Upgrade vLLM to version 0.18.0 or newer, which contains the fix removing the vulnerable librosa downmixing method.
  • Audit your audio processing pipeline to ensure it no longer uses librosa for downmixing and instead uses the updated pyav-based methods.
  • Test your AI models with the updated audio processing to confirm consistency with the ITU-R BS.775-4 standard.

These steps will prevent attackers from exploiting the discrepancy in audio downmixing that could lead to bypassing voice authentication, evading content moderation, or disrupting speech recognition.


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart