CVE-2026-45973
Analyzed Analyzed - Analysis Complete
UMR Hang in Linux Kernel RDMA/mlx5 Driver

Publication date: 2026-05-27

Last updated on: 2026-06-16

Assigner: kernel.org

Description
In the Linux kernel, the following vulnerability has been resolved: RDMA/mlx5: Fix UMR hang in LAG error state unload During firmware reset in LAG mode, a race condition causes the driver to hang indefinitely while waiting for UMR completion during device unload. See [1]. In LAG mode the bond device is only registered on the master, so it never sees sys_error events from the slave. During firmware reset this causes UMR waits to hang forever on unload as the slave is dead but the master hasn't entered error state yet, so UMR posts succeed but completions never arrive. Fix this by adding a sys_error notifier that gets registered before MLX5_IB_STAGE_IB_REG and stays alive until after ib_unregister_device(). This ensures error events reach the bond device throughout teardown. [1] Call Trace: __schedule+0x2bd/0x760 schedule+0x37/0xa0 schedule_preempt_disabled+0xa/0x10 __mutex_lock.isra.6+0x2b5/0x4a0 __mlx5_ib_dereg_mr+0x606/0x870 [mlx5_ib] ? __xa_erase+0x4a/0xa0 ? _cond_resched+0x15/0x30 ? wait_for_completion+0x31/0x100 ib_dereg_mr_user+0x48/0xc0 [ib_core] ? rdmacg_uncharge_hierarchy+0xa0/0x100 destroy_hw_idr_uobject+0x20/0x50 [ib_uverbs] uverbs_destroy_uobject+0x37/0x150 [ib_uverbs] __uverbs_cleanup_ufile+0xda/0x140 [ib_uverbs] uverbs_destroy_ufile_hw+0x3a/0xf0 [ib_uverbs] ib_uverbs_remove_one+0xc3/0x140 [ib_uverbs] remove_client_context+0x8b/0xd0 [ib_core] disable_device+0x8c/0x130 [ib_core] __ib_unregister_device+0x10d/0x180 [ib_core] ib_unregister_device+0x21/0x30 [ib_core] __mlx5_ib_remove+0x1e4/0x1f0 [mlx5_ib] auxiliary_bus_remove+0x1e/0x30 device_release_driver_internal+0x103/0x1f0 bus_remove_device+0xf7/0x170 device_del+0x181/0x410 mlx5_rescan_drivers_locked.part.10+0xa9/0x1d0 [mlx5_core] mlx5_disable_lag+0x253/0x260 [mlx5_core] mlx5_lag_disable_change+0x89/0xc0 [mlx5_core] mlx5_eswitch_disable+0x67/0xa0 [mlx5_core] mlx5_unload+0x15/0xd0 [mlx5_core] mlx5_unload_one+0x71/0xc0 [mlx5_core] mlx5_sync_reset_reload_work+0x83/0x100 [mlx5_core] process_one_work+0x1a7/0x360 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x116/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x22/0x40
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2026-05-27
Last Modified
2026-06-16
Generated
2026-06-16
AI Q&A
2026-05-27
EPSS Evaluated
2026-06-15
NVD
EUVD
Affected Vendors & Products
Showing 5 associated CPEs
Vendor Product Version / Range
linux linux_kernel From 6.11.11 (inc) to 6.12 (exc)
linux linux_kernel From 6.6.64 (inc) to 6.7 (exc)
linux linux_kernel From 6.13 (inc) to 6.18.14 (exc)
linux linux_kernel From 6.19 (inc) to 6.19.4 (exc)
linux linux_kernel From 6.12.2 (inc) to 6.12.75 (exc)
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-UNKNOWN
Attack-Flow Graph
AI Quick Actions
Instant insights powered by AI
Compliance Impact

The provided information does not include any details about the impact of this vulnerability on compliance with common standards and regulations such as GDPR or HIPAA.

Executive Summary

This vulnerability exists in the Linux kernel's RDMA mlx5 driver when operating in LAG (Link Aggregation Group) mode. During a firmware reset, a race condition causes the driver to hang indefinitely while waiting for UMR (User Memory Region) completion during device unload.

The issue arises because in LAG mode, the bond device is only registered on the master interface and does not receive error events from the slave interface. When the firmware resets, the slave interface is dead but the master has not yet entered an error state, causing UMR posts to succeed but their completions never arrive, leading to a hang.

The fix involves adding a sys_error notifier that registers before MLX5_IB_STAGE_IB_REG and remains active until after ib_unregister_device(), ensuring error events reach the bond device throughout the teardown process.

Impact Analysis

This vulnerability can cause the Linux kernel driver to hang indefinitely during device unload in LAG mode, potentially leading to system instability or degraded performance.

If your system relies on RDMA mlx5 devices in LAG mode, this hang could disrupt normal operations, delay device resets or unloads, and affect applications depending on these devices.

Detection Guidance

This vulnerability involves a hang in the mlx5_ib driver during device unload in LAG mode caused by a race condition during firmware reset. Detection would involve monitoring for symptoms such as indefinite hangs or deadlocks related to RDMA device unload operations.

Specifically, detection could focus on observing kernel logs for call traces similar to the one described, which include functions like __mlx5_ib_dereg_mr, ib_dereg_mr_user, ib_unregister_device, and mlx5_unload. These traces indicate the driver is stuck waiting for UMR completion.

Commands to help detect this issue might include checking kernel logs with:

  • dmesg | grep mlx5_ib
  • journalctl -k | grep mlx5_ib

Additionally, monitoring for hung processes related to RDMA device unload could be done with:

  • ps aux | grep mlx5_ib
  • lsmod | grep mlx5_ib

However, no explicit detection commands or tools are provided in the available information.

Mitigation Strategies

The vulnerability is fixed by adding a sys_error notifier that ensures error events reach the bond device throughout teardown, preventing the UMR hang during device unload in LAG mode.

Immediate mitigation steps include:

  • Update the Linux kernel to a version that includes the fix for this issue.
  • Avoid unloading mlx5_ib devices in LAG mode during firmware reset until the fix is applied.
  • Monitor system logs for symptoms of the hang and avoid operations that trigger device unload in LAG mode.

No other specific mitigation commands or workarounds are provided in the available information.

Chat Assistant
Ask questions about this CVE
Hi! I’m here to help you understand CVE-2026-45973. Ask me anything about the vulnerability, its impact, or mitigation strategies.
0/70
EPSS Chart