CVE-2025-38355
Unknown Unknown - Not Provided
BaseFortify

Publication date: 2025-07-25

Last updated on: 2025-11-18

Assigner: kernel.org

Description
In the Linux kernel, the following vulnerability has been resolved: drm/xe: Process deferred GGTT node removals on device unwind While we are indirectly draining our dedicated workqueue ggtt->wq that we use to complete asynchronous removal of some GGTT nodes, this happends as part of the managed-drm unwinding (ggtt_fini_early), which could be later then manage-device unwinding, where we could already unmap our MMIO/GMS mapping (mmio_fini). This was recently observed during unsuccessful VF initialization: [ ] xe 0000:00:02.1: probe with driver xe failed with error -62 [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747340 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747540 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747240 __xe_bo_unpin_map_no_vm (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747040 tiles_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746840 mmio_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e747f40 xe_bo_pinned_fini (16 bytes) [ ] xe 0000:00:02.1: DEVRES REL ffff88811e746b40 devm_drm_dev_init_release (16 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] drmres release begin [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef81640 __fini_relay (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80d40 guc_ct_fini (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80040 __drmm_mutex_release (8 bytes) [ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80140 ggtt_fini_early (8 bytes) and this was leading to: [ ] BUG: unable to handle page fault for address: ffffc900058162a0 [ ] #PF: supervisor write access in kernel mode [ ] #PF: error_code(0x0002) - not-present page [ ] Oops: Oops: 0002 [#1] SMP NOPTI [ ] Tainted: [W]=WARN [ ] Workqueue: xe-ggtt-wq ggtt_node_remove_work_func [xe] [ ] RIP: 0010:xe_ggtt_set_pte+0x6d/0x350 [xe] [ ] Call Trace: [ ] <TASK> [ ] xe_ggtt_clear+0xb0/0x270 [xe] [ ] ggtt_node_remove+0xbb/0x120 [xe] [ ] ggtt_node_remove_work_func+0x30/0x50 [xe] [ ] process_one_work+0x22b/0x6f0 [ ] worker_thread+0x1e8/0x3d Add managed-device action that will explicitly drain the workqueue with all pending node removals prior to releasing MMIO/GSM mapping. (cherry picked from commit 89d2835c3680ab1938e22ad81b1c9f8c686bd391)
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2025-07-25
Last Modified
2025-11-18
Generated
2026-05-07
AI Q&A
2025-07-25
EPSS Evaluated
2026-05-05
NVD
Affected Vendors & Products
Showing 5 associated CPEs
Vendor Product Version / Range
linux linux_kernel From 5.15.160 (inc) to 5.16 (inc)
linux linux_kernel 6.16
linux linux_kernel 6.16
linux linux_kernel 6.16
linux linux_kernel From 5.15.160 (inc) to 5.16 (inc)
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-NVD-CWE-noinfo
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?

This vulnerability in the Linux kernel's drm/xe component involves improper handling of deferred GGTT node removals during device unwinding. Specifically, asynchronous removal of some GGTT nodes uses a dedicated workqueue that may not be properly drained before the MMIO/GMS mappings are unmapped during device shutdown. This can lead to a race condition causing a kernel page fault and system crash during device initialization or removal.


How can this vulnerability impact me? :

This vulnerability can cause kernel crashes (oops) due to page faults when the system tries to access unmapped memory during device removal or initialization. This can lead to system instability, failed device initialization, and potential denial of service on affected systems using the drm/xe driver.


How can this vulnerability be detected on my network or system? Can you suggest some commands?

This vulnerability can be detected by monitoring kernel logs for specific error messages related to the drm/xe driver, such as failed VF initialization errors and workqueue processing issues. Look for log entries similar to: - xe 0000:00:02.1: probe with driver xe failed with error -62 - DEVRES REL messages related to __xe_bo_unpin_map_no_vm, tiles_fini, mmio_fini, xe_bo_pinned_fini, devm_drm_dev_init_release - drm_managed_release related REL messages - BUG: unable to handle page fault for address - #PF: supervisor write access in kernel mode - Workqueue: xe-ggtt-wq ggtt_node_remove_work_func [xe] Commands to check kernel logs: - sudo dmesg | grep xe - sudo journalctl -k | grep xe - sudo dmesg | grep -E 'BUG|Oops|Workqueue|drm_managed_release' These commands help identify the presence of the described error patterns indicating the vulnerability.


What immediate steps should I take to mitigate this vulnerability?

Immediate mitigation involves updating the Linux kernel to a version that includes the fix which adds a managed-device action explicitly draining the workqueue with all pending node removals prior to releasing MMIO/GSM mapping. This prevents the race condition causing the page fault and driver failure. Until the update is applied, monitoring for the error messages and avoiding workloads that trigger VF initialization on the xe driver may reduce exposure.


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart