CVE-2026-31765
Buffer Overflow in AMDGPU Kernel Driver
Publication date: 2026-05-01
Last updated on: 2026-05-01
Assigner: kernel.org
Description
Description
CVSS Scores
EPSS Scores
| Probability: | |
| Percentile: |
Meta Information
Affected Vendors & Products
| Vendor | Product | Version / Range |
|---|---|---|
| amd | amdgpu | From 6.19.0-rc4 (inc) |
| amd | amdgpu | From 64KB (inc) |
| amd | amdgpu | to 64KB (inc) |
Helpful Resources
Exploitability
| CWE ID | Description |
|---|---|
| CWE-UNKNOWN |
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?
This vulnerability exists in the Linux kernel's AMDGPU driver where a reserved trap size value (AMDGPU_VA_RESERVED_TRAP_SIZE) was hardcoded to 8KB, while another related size (KFD_CWSR_TBA_TMA_SIZE) depends on the system's page size. On systems with 64KB pages, this mismatch causes the kernel to crash when running certain GPU-related tests like rocminfo or rccl unit tests.
The root cause is that on 64KB page-size systems, KFD_CWSR_TBA_TMA_SIZE becomes 128KB, but the reserved trap area remains at 8KB, leading to allocation inconsistencies and kernel crashes due to invalid memory access.
The fix involved changing AMDGPU_VA_RESERVED_TRAP_SIZE to 64KB and adjusting KFD_CWSR_TBA_TMA_SIZE to match the AMD GPU page size, ensuring the allocation size never exceeds the reserved area.
How can this vulnerability impact me? :
This vulnerability can cause the Linux kernel to crash on systems using AMD GPUs with 64KB page sizes when running GPU-related operations or tests such as rocminfo or rccl unit tests.
Kernel crashes can lead to system instability, potential data loss, and disruption of services relying on GPU computations.
Additionally, the crash is triggered by a kernel NULL pointer dereference, which might be exploitable under certain conditions, potentially leading to denial of service.
How can this vulnerability be detected on my network or system? Can you suggest some commands?
This vulnerability manifests as a kernel crash when running specific AMD GPU related tests such as rocminfo or rccl unit tests on systems with 64K page sizes.
Detection can involve monitoring for kernel crash logs containing messages like "Kernel NULL pointer dereference on read at 0x00000002" or "Oops: Kernel access of bad area" related to the amdgpu driver.
Running the command `rocminfo` or rccl unit tests on affected systems may trigger the issue, revealing the vulnerability.
Checking kernel logs with commands such as `dmesg | grep -i amdgpu` or `journalctl -k | grep -i amdgpu` may help identify related crash messages.
What immediate steps should I take to mitigate this vulnerability?
The vulnerability is resolved by updating the Linux kernel to include the patch that changes AMDGPU_VA_RESERVED_TRAP_SIZE to 64 KB and adjusts KFD_CWSR_TBA_TMA_SIZE to match the AMD GPU page size.
Immediate mitigation involves applying the kernel update or patch that addresses this issue to prevent kernel crashes caused by the mismatch in reserved trap size and allocation size.
Until the patch is applied, avoid running rocminfo or rccl unit tests on systems with 64K page sizes to reduce the risk of triggering the kernel crash.