CVE-2025-38520
Unknown Unknown - Not Provided
BaseFortify

Publication date: 2025-08-16

Last updated on: 2025-11-03

Assigner: kernel.org

Description
In the Linux kernel, the following vulnerability has been resolved: drm/amdkfd: Don't call mmput from MMU notifier callback If the process is exiting, the mmput inside mmu notifier callback from compactd or fork or numa balancing could release the last reference of mm struct to call exit_mmap and free_pgtable, this triggers deadlock with below backtrace. The deadlock will leak kfd process as mmu notifier release is not called and cause VRAM leaking. The fix is to take mm reference mmget_non_zero when adding prange to the deferred list to pair with mmput in deferred list work. If prange split and add into pchild list, the pchild work_item.mm is not used, so remove the mm parameter from svm_range_unmap_split and svm_range_add_child. The backtrace of hung task: INFO: task python:348105 blocked for more than 64512 seconds. Call Trace: __schedule+0x1c3/0x550 schedule+0x46/0xb0 rwsem_down_write_slowpath+0x24b/0x4c0 unlink_anon_vmas+0xb1/0x1c0 free_pgtables+0xa9/0x130 exit_mmap+0xbc/0x1a0 mmput+0x5a/0x140 svm_range_cpu_invalidate_pagetables+0x2b/0x40 [amdgpu] mn_itree_invalidate+0x72/0xc0 __mmu_notifier_invalidate_range_start+0x48/0x60 try_to_unmap_one+0x10fa/0x1400 rmap_walk_anon+0x196/0x460 try_to_unmap+0xbb/0x210 migrate_page_unmap+0x54d/0x7e0 migrate_pages_batch+0x1c3/0xae0 migrate_pages_sync+0x98/0x240 migrate_pages+0x25c/0x520 compact_zone+0x29d/0x590 compact_zone_order+0xb6/0xf0 try_to_compact_pages+0xbe/0x220 __alloc_pages_direct_compact+0x96/0x1a0 __alloc_pages_slowpath+0x410/0x930 __alloc_pages_nodemask+0x3a9/0x3e0 do_huge_pmd_anonymous_page+0xd7/0x3e0 __handle_mm_fault+0x5e3/0x5f0 handle_mm_fault+0xf7/0x2e0 hmm_vma_fault.isra.0+0x4d/0xa0 walk_pmd_range.isra.0+0xa8/0x310 walk_pud_range+0x167/0x240 walk_pgd_range+0x55/0x100 __walk_page_range+0x87/0x90 walk_page_range+0xf6/0x160 hmm_range_fault+0x4f/0x90 amdgpu_hmm_range_get_pages+0x123/0x230 [amdgpu] amdgpu_ttm_tt_get_user_pages+0xb1/0x150 [amdgpu] init_user_pages+0xb1/0x2a0 [amdgpu] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x543/0x7d0 [amdgpu] kfd_ioctl_alloc_memory_of_gpu+0x24c/0x4e0 [amdgpu] kfd_ioctl+0x29d/0x500 [amdgpu] (cherry picked from commit a29e067bd38946f752b0ef855f3dfff87e77bec7)
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2025-08-16
Last Modified
2025-11-03
Generated
2026-05-27
AI Q&A
2025-08-16
EPSS Evaluated
2026-05-25
NVD
Affected Vendors & Products
Showing 1 associated CPE
Vendor Product Version / Range
linux linux_kernel 6.1.153
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-UNKNOWN
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?

This vulnerability in the Linux kernel's drm/amdkfd component involves improper handling of memory management unit (MMU) notifier callbacks. Specifically, calling mmput from within an MMU notifier callback when a process is exiting can release the last reference to a memory structure (mm struct), triggering exit_mmap and free_pgtable functions that cause a deadlock. This deadlock prevents the release of the kfd process and leads to VRAM leaking. The issue arises during operations like compactd, fork, or NUMA balancing. The fix involves properly managing references by taking a non-zero mm reference when adding prange to a deferred list to avoid the deadlock.


How can this vulnerability impact me? :

This vulnerability can cause a deadlock in the Linux kernel, which results in a hung task and leaking of VRAM resources. This can degrade system stability and performance, potentially causing processes to block indefinitely and consume GPU memory unnecessarily, which may affect applications relying on AMD GPU functionality.


How can this vulnerability be detected on my network or system? Can you suggest some commands?

This vulnerability can be detected by monitoring for hung or blocked tasks related to the amdkfd driver in the Linux kernel. Specifically, look for processes blocked for an unusually long time (e.g., thousands of seconds) with backtraces involving mmput, exit_mmap, and amdgpu kernel modules. Commands such as 'dmesg' or 'journalctl -k' can be used to check kernel logs for messages like 'task <process_name> blocked for more than <time> seconds' and the associated backtrace. Additionally, using 'ps' or 'top' to identify processes stuck in uninterruptible sleep (D state) may help detect symptoms of this deadlock.


What immediate steps should I take to mitigate this vulnerability?

The immediate mitigation is to update the Linux kernel to a version that includes the fix for this vulnerability, which involves changes to the amdkfd driver to avoid calling mmput from the MMU notifier callback. Until the update is applied, avoid workloads or operations that trigger heavy memory management activities such as compactd, fork, or NUMA balancing on systems using the affected amdkfd driver to reduce the risk of deadlocks and VRAM leaks.


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart