CVE-2026-43194
Awaiting Analysis Awaiting Analysis - Queue
TCP GRO Test Stall Due to Xmit Error Handling in Linux Kernel

Publication date: 2026-05-06

Last updated on: 2026-05-06

Assigner: kernel.org

Description
In the Linux kernel, the following vulnerability has been resolved: net: consume xmit errors of GSO frames udpgro_frglist.sh and udpgro_bench.sh are the flakiest tests currently in NIPA. They fail in the same exact way, TCP GRO test stalls occasionally and the test gets killed after 10min. These tests use veth to simulate GRO. They attach a trivial ("return XDP_PASS;") XDP program to the veth to force TSO off and NAPI on. Digging into the failure mode we can see that the connection is completely stuck after a burst of drops. The sender's snd_nxt is at sequence number N [1], but the receiver claims to have received (rcv_nxt) up to N + 3 * MSS [2]. Last piece of the puzzle is that senders rtx queue is not empty (let's say the block in the rtx queue is at sequence number N - 4 * MSS [3]). In this state, sender sends a retransmission from the rtx queue with a single segment, and sequence numbers N-4*MSS:N-3*MSS [3]. Receiver sees it and responds with an ACK all the way up to N + 3 * MSS [2]. But sender will reject this ack as TCP_ACK_UNSENT_DATA because it has no recollection of ever sending data that far out [1]. And we are stuck. The root cause is the mess of the xmit return codes. veth returns an error when it can't xmit a frame. We end up with a loss event like this: ------------------------------------------------- | GSO super frame 1 | GSO super frame 2 | |-----------------------------------------------| | seg | seg | seg | seg | seg | seg | seg | seg | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ------------------------------------------------- x ok ok <ok>| ok ok ok <x> \\ snd_nxt "x" means packet lost by veth, and "ok" means it went thru. Since veth has TSO disabled in this test it sees individual segments. Segment 1 is on the retransmit queue and will be resent. So why did the sender not advance snd_nxt even tho it clearly did send up to seg 8? tcp_write_xmit() interprets the return code from the core to mean that data has not been sent at all. Since TCP deals with GSO super frames, not individual segment the crux of the problem is that loss of a single segment can be interpreted as loss of all. TCP only sees the last return code for the last segment of the GSO frame (in <> brackets in the diagram above). Of course for the problem to occur we need a setup or a device without a Qdisc. Otherwise Qdisc layer disconnects the protocol layer from the device errors completely. We have multiple ways to fix this. 1) make veth not return an error when it lost a packet. While this is what I think we did in the past, the issue keeps reappearing and it's annoying to debug. The game of whack a mole is not great. 2) fix the damn return codes We only talk about NETDEV_TX_OK and NETDEV_TX_BUSY in the documentation, so maybe we should make the return code from ndo_start_xmit() a boolean. I like that the most, but perhaps some ancient, not-really-networking protocol would suffer. 3) make TCP ignore the errors It is not entirely clear to me what benefit TCP gets from interpreting the result of ip_queue_xmit()? Specifically once the connection is established and we're pushing data - packet loss is just packet loss? 4) this fix Ignore the rc in the Qdisc-less+GSO case, since it's unreliable. We already always return OK in the TCQ_F_CAN_BYPASS case. In the Qdisc-less case let's be a bit more conservative and only mask the GSO errors. This path is taken by non-IP-"networks" like CAN, MCTP etc, so we could regress some ancient thing. This is the simplest, but also maybe the hackiest fix? Similar fix has been proposed by Eric in the past but never committed because original reporter was working with an OOT driver and wasn't providing feedback (see Link).
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2026-05-06
Last Modified
2026-05-06
Generated
2026-05-07
AI Q&A
2026-05-06
EPSS Evaluated
N/A
NVD
EUVD
Affected Vendors & Products
Showing 1 associated CPE
Vendor Product Version / Range
linux linux_kernel *
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-UNKNOWN
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?

This vulnerability in the Linux kernel involves how transmission (xmit) errors of Generic Segmentation Offload (GSO) frames are handled, particularly with the veth device used in network testing. When a packet segment is lost, the sender incorrectly interprets the loss as if the entire GSO super frame was lost, causing the TCP connection to become stuck. This happens because the return codes from the network device do not accurately reflect partial segment loss, leading to a mismatch between what the sender believes was sent and what the receiver acknowledges.

The root cause is that veth returns an error when it cannot transmit a frame, but TCP only sees the last return code for the last segment of the GSO frame. This causes TCP to reject acknowledgments for data it thinks was never sent, resulting in a stalled connection. The issue is exacerbated in setups without a queuing discipline (Qdisc), which normally isolates protocol layers from device errors.

Several fixes were considered, including changing how veth reports errors, adjusting return codes, or making TCP ignore these errors. The chosen fix was to ignore the return codes in the Qdisc-less GSO case, as they are unreliable, preventing the connection from getting stuck.


How can this vulnerability impact me? :

This vulnerability can cause TCP connections to become stuck or stalled after packet loss events, leading to network communication failures or degraded performance. Specifically, the sender may stop advancing its send pointer due to misinterpreted transmission errors, resulting in stalled data transmission and potentially causing network applications relying on TCP to hang or timeout.

The impact is particularly relevant in environments using virtual Ethernet devices (veth) or setups without queuing disciplines, where this error handling flaw can manifest. This can affect network reliability and stability, especially in testing or simulation scenarios that rely on these configurations.


What immediate steps should I take to mitigate this vulnerability?

The vulnerability is related to how the Linux kernel handles transmit errors of GSO frames, especially in setups without a Qdisc. Immediate mitigation involves applying the fix that ignores the return code in the Qdisc-less+GSO case, since it is unreliable.

  • Apply the kernel patch that masks GSO errors in Qdisc-less setups.
  • Avoid using devices or configurations without a Qdisc layer where possible.
  • Consider disabling or adjusting veth device behavior to not return errors on lost packets.

Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart