CVE-2026-31765
Awaiting Analysis Awaiting Analysis - Queue
Buffer Overflow in AMDGPU Kernel Driver

Publication date: 2026-05-01

Last updated on: 2026-05-01

Assigner: kernel.org

Description
In the Linux kernel, the following vulnerability has been resolved: drm/amdgpu: Change AMDGPU_VA_RESERVED_TRAP_SIZE to 64KB Currently, AMDGPU_VA_RESERVED_TRAP_SIZE is hardcoded to 8KB, while KFD_CWSR_TBA_TMA_SIZE is defined as 2 * PAGE_SIZE. On systems with 4K pages, both values match (8KB), so allocation and reserved space are consistent. However, on 64K page-size systems, KFD_CWSR_TBA_TMA_SIZE becomes 128KB, while the reserved trap area remains 8KB. This mismatch causes the kernel to crash when running rocminfo or rccl unit tests. Kernel attempted to read user page (2) - exploit attempt? (uid: 1001) BUG: Kernel NULL pointer dereference on read at 0x00000002 Faulting instruction address: 0xc0000000002c8a64 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries CPU: 34 UID: 1001 PID: 9379 Comm: rocminfo Tainted: G E 6.19.0-rc4-amdgpu-00320-gf23176405700 #56 VOLUNTARY Tainted: [E]=UNSIGNED_MODULE Hardware name: IBM,9105-42A POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.30 (ML1060_896) hv:phyp pSeries NIP: c0000000002c8a64 LR: c00000000125dbc8 CTR: c00000000125e730 REGS: c0000001e0957580 TRAP: 0300 Tainted: G E MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24008268 XER: 00000036 CFAR: c00000000125dbc4 DAR: 0000000000000002 DSISR: 40000000 IRQMASK: 1 GPR00: c00000000125d908 c0000001e0957820 c0000000016e8100 c00000013d814540 GPR04: 0000000000000002 c00000013d814550 0000000000000045 0000000000000000 GPR08: c00000013444d000 c00000013d814538 c00000013d814538 0000000084002268 GPR12: c00000000125e730 c000007e2ffd5f00 ffffffffffffffff 0000000000020000 GPR16: 0000000000000000 0000000000000002 c00000015f653000 0000000000000000 GPR20: c000000138662400 c00000013d814540 0000000000000000 c00000013d814500 GPR24: 0000000000000000 0000000000000002 c0000001e0957888 c0000001e0957878 GPR28: c00000013d814548 0000000000000000 c00000013d814540 c0000001e0957888 NIP [c0000000002c8a64] __mutex_add_waiter+0x24/0xc0 LR [c00000000125dbc8] __mutex_lock.constprop.0+0x318/0xd00 Call Trace: 0xc0000001e0957890 (unreliable) __mutex_lock.constprop.0+0x58/0xd00 amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x6fc/0xb60 [amdgpu] kfd_process_alloc_gpuvm+0x54/0x1f0 [amdgpu] kfd_process_device_init_cwsr_dgpu+0xa4/0x1a0 [amdgpu] kfd_process_device_init_vm+0xd8/0x2e0 [amdgpu] kfd_ioctl_acquire_vm+0xd0/0x130 [amdgpu] kfd_ioctl+0x514/0x670 [amdgpu] sys_ioctl+0x134/0x180 system_call_exception+0x114/0x300 system_call_vectored_common+0x15c/0x2ec This patch changes AMDGPU_VA_RESERVED_TRAP_SIZE to 64 KB and KFD_CWSR_TBA_TMA_SIZE to the AMD GPU page size. This means we reserve 64 KB for the trap in the address space, but only allocate 8 KB within it. With this approach, the allocation size never exceeds the reserved area. (cherry picked from commit 31b8de5e55666f26ea7ece5f412b83eab3f56dbb)
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2026-05-01
Last Modified
2026-05-01
Generated
2026-05-07
AI Q&A
2026-05-01
EPSS Evaluated
2026-05-05
NVD
EUVD
Affected Vendors & Products
Showing 3 associated CPEs
Vendor Product Version / Range
amd amdgpu From 6.19.0-rc4 (inc)
amd amdgpu From 64KB (inc)
amd amdgpu to 64KB (inc)
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-UNKNOWN
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?

This vulnerability exists in the Linux kernel's AMDGPU driver where a reserved trap size value (AMDGPU_VA_RESERVED_TRAP_SIZE) was hardcoded to 8KB, while another related size (KFD_CWSR_TBA_TMA_SIZE) depends on the system's page size. On systems with 64KB pages, this mismatch causes the kernel to crash when running certain GPU-related tests like rocminfo or rccl unit tests.

The root cause is that on 64KB page-size systems, KFD_CWSR_TBA_TMA_SIZE becomes 128KB, but the reserved trap area remains at 8KB, leading to allocation inconsistencies and kernel crashes due to invalid memory access.

The fix involved changing AMDGPU_VA_RESERVED_TRAP_SIZE to 64KB and adjusting KFD_CWSR_TBA_TMA_SIZE to match the AMD GPU page size, ensuring the allocation size never exceeds the reserved area.


How can this vulnerability impact me? :

This vulnerability can cause the Linux kernel to crash on systems using AMD GPUs with 64KB page sizes when running GPU-related operations or tests such as rocminfo or rccl unit tests.

Kernel crashes can lead to system instability, potential data loss, and disruption of services relying on GPU computations.

Additionally, the crash is triggered by a kernel NULL pointer dereference, which might be exploitable under certain conditions, potentially leading to denial of service.


How can this vulnerability be detected on my network or system? Can you suggest some commands?

This vulnerability manifests as a kernel crash when running specific AMD GPU related tests such as rocminfo or rccl unit tests on systems with 64K page sizes.

Detection can involve monitoring for kernel crash logs containing messages like "Kernel NULL pointer dereference on read at 0x00000002" or "Oops: Kernel access of bad area" related to the amdgpu driver.

Running the command `rocminfo` or rccl unit tests on affected systems may trigger the issue, revealing the vulnerability.

Checking kernel logs with commands such as `dmesg | grep -i amdgpu` or `journalctl -k | grep -i amdgpu` may help identify related crash messages.


What immediate steps should I take to mitigate this vulnerability?

The vulnerability is resolved by updating the Linux kernel to include the patch that changes AMDGPU_VA_RESERVED_TRAP_SIZE to 64 KB and adjusts KFD_CWSR_TBA_TMA_SIZE to match the AMD GPU page size.

Immediate mitigation involves applying the kernel update or patch that addresses this issue to prevent kernel crashes caused by the mismatch in reserved trap size and allocation size.

Until the patch is applied, avoid running rocminfo or rccl unit tests on systems with 64K page sizes to reduce the risk of triggering the kernel crash.


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart