CVE-2025-68232
Unknown Unknown - Not Provided
BaseFortify

Publication date: 2025-12-16

Last updated on: 2025-12-18

Assigner: kernel.org

Description
In the Linux kernel, the following vulnerability has been resolved: veth: more robust handing of race to avoid txq getting stuck Commit dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops") introduced a race condition that can lead to a permanently stalled TXQ. This was observed in production on ARM64 systems (Ampere Altra Max). The race occurs in veth_xmit(). The producer observes a full ptr_ring and stops the queue (netif_tx_stop_queue()). The subsequent conditional logic, intended to re-wake the queue if the consumer had just emptied it (if (__ptr_ring_empty(...)) netif_tx_wake_queue()), can fail. This leads to a "lost wakeup" where the TXQ remains stopped (QUEUE_STATE_DRV_XOFF) and traffic halts. This failure is caused by an incorrect use of the __ptr_ring_empty() API from the producer side. As noted in kernel comments, this check is not guaranteed to be correct if a consumer is operating on another CPU. The empty test is based on ptr_ring->consumer_head, making it reliable only for the consumer. Using this check from the producer side is fundamentally racy. This patch fixes the race by adopting the more robust logic from an earlier version V4 of the patchset, which always flushed the peer: (1) In veth_xmit(), the racy conditional wake-up logic and its memory barrier are removed. Instead, after stopping the queue, we unconditionally call __veth_xdp_flush(rq). This guarantees that the NAPI consumer is scheduled, making it solely responsible for re-waking the TXQ. This handles the race where veth_poll() consumes all packets and completes NAPI *before* veth_xmit() on the producer side has called netif_tx_stop_queue. The __veth_xdp_flush(rq) will observe rx_notify_masked is false and schedule NAPI. (2) On the consumer side, the logic for waking the peer TXQ is moved out of veth_xdp_rcv() and placed at the end of the veth_poll() function. This placement is part of fixing the race, as the netif_tx_queue_stopped() check must occur after rx_notify_masked is potentially set to false during NAPI completion. This handles the race where veth_poll() consumes all packets, but haven't finished (rx_notify_masked is still true). The producer veth_xmit() stops the TXQ and __veth_xdp_flush(rq) will observe rx_notify_masked is true, meaning not starting NAPI. Then veth_poll() change rx_notify_masked to false and stops NAPI. Before exiting veth_poll() will observe TXQ is stopped and wake it up.
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2025-12-16
Last Modified
2025-12-18
Generated
2026-05-27
AI Q&A
2025-12-16
EPSS Evaluated
2026-05-25
NVD
EUVD
Affected Vendors & Products
Showing 1 associated CPE
Vendor Product Version / Range
linux linux_kernel *
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-UNKNOWN
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?

This vulnerability is a race condition in the Linux kernel's veth network driver introduced by a commit that caused the transmit queue (TXQ) to become permanently stalled. The issue arises because the producer side incorrectly checks if the consumer's packet ring buffer is empty using an API that is not reliable from the producer's perspective. This can lead to a 'lost wakeup' where the TXQ remains stopped and network traffic halts. The fix involves removing the racy conditional wake-up logic and instead unconditionally flushing the peer to ensure the consumer is scheduled to re-wake the TXQ, thus preventing the queue from getting stuck.


How can this vulnerability impact me? :

This vulnerability can cause the transmit queue in the veth network interface to become permanently stalled, resulting in halted network traffic. This means that network communication over the affected virtual Ethernet interfaces can stop functioning, potentially leading to network outages or degraded performance on systems using the affected Linux kernel, especially on ARM64 platforms like Ampere Altra Max.


What immediate steps should I take to mitigate this vulnerability?

To mitigate this vulnerability, update the Linux kernel to a version that includes the patch fixing the veth TXQ race condition. The patch removes the racy conditional wake-up logic and ensures the TX queue is properly managed to avoid stalls. Applying the updated kernel will prevent the TXQ from getting permanently stuck and traffic from halting.


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart