CVE-2026-46223
Awaiting Analysis Awaiting Analysis - Queue
Use-After-Free in Linux Kernel cgroup Subsystem

Publication date: 2026-05-28

Last updated on: 2026-05-28

Assigner: kernel.org

Description
In the Linux kernel, the following vulnerability has been resolved: cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated A chain of commits going back to v7.0 reworked rmdir to satisfy the controller invariant that a subsystem's ->css_offline() must not run while tasks are still doing kernel-side work in the cgroup. [1] d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done switching out") [2] a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") [3] 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir") [4] 4c56a8ac6869 ("cgroup: Fix cgroup_drain_dying() testing the wrong condition") [5] 13e786b64bd3 ("cgroup: Increment nr_dying_subsys_* from rmdir context") [1] moved task cset unlink from do_exit() to finish_task_switch() so a task's cset link drops only after the task has fully stopped scheduling. That made tasks past exit_signals() linger on cset->tasks until their final context switch, which led to a series of problems as what userspace expected to see after rmdir diverged from what the kernel needs to wait for. [2]-[5] tried to bridge that divergence: [2] filtered the exiting tasks from cgroup.procs; [3] had rmdir(2) sleep in TASK_UNINTERRUPTIBLE for them; [4] fixed the wait's condition; [5] made nr_dying_subsys_* visible synchronously. The cgroup_drain_dying() wait in [3] turned out to be a dead end. When the rmdir caller is also the reaper of a zombie that pins a pidns teardown (e.g. host PID 1 systemd reaping orphan pids that were re-parented to it during the same teardown), rmdir blocks in TASK_UNINTERRUPTIBLE waiting for those pids to free, the pids can't free because PID 1 is the reaper and it's stuck in rmdir, and the system A-A deadlocks. No internal lock ordering breaks this; the wait itself is the bug. The css killing side that drove the original reorder, however, can be made cleanly asynchronous: ->css_offline() is already async, run from css_killed_work_fn() driven by percpu_ref_kill_and_confirm(). The fix is to make that chain start only after all tasks have left the cgroup. rmdir's user-visible side then returns as soon as cgroup.procs and friends are empty, while ->css_offline() still runs only after the cgroup is fully drained. Verified by the original reproducer (pidns teardown + zombie reaper, runs under vng) which hangs vanilla and succeeds here, and by per-commit deterministic repros for [2], [3], [4], [5] with a boot parameter that widens the post-exit_signals() window so each state is reliably reachable. Some stress tests on top of that. cgroup_apply_control_disable() has the same shape of pre-existing race: when a controller is disabled via subtree_control, kill_css() ran synchronously while tasks past exit_signals() could still be linked to the cgroup's csets, and ->css_offline() could fire before they drained. This patch preserves the existing synchronous behavior at that call site (kill_css_sync() + kill_css_finish() back-to-back) and a follow-up patch will defer kill_css_finish() there using a per-css trigger. This seems like the right approach and I don't see problems with it. The changes are somewhat invasive but not excessively so, so backporting to -stable should be okay. If something does turn out to be wrong, the fallback is to revert the entire chain ([1]-[5]) and rework in the development branch instead. v2: Pin cgrp across the deferred destroy work with explicit cgroup_get()/cgroup_put() around queue_work() and the work_fn. v1 wasn't actually broken (ordered cgroup_offline_wq + queue_work order in cgroup_task_dead() saved it) but the explicit ref removes the dependency on those non-obvious invariants. Also note the pre-existing cgroup_apply_control_disable() race in the description; a follow-up will defer kill_css_finish() there.
CVSS Scores
EPSS Scores
Probability:
Percentile:
Meta Information
Published
2026-05-28
Last Modified
2026-05-28
Generated
2026-05-28
AI Q&A
2026-05-28
EPSS Evaluated
N/A
NVD
EUVD
Affected Vendors & Products
Showing 1 associated CPE
Vendor Product Version / Range
linux_kernel linux_kernel *
Helpful Resources
Exploitability
CWE
CWE Icon
KEV
KEV Icon
CWE ID Description
CWE-UNKNOWN
Attack-Flow Graph
AI Powered Q&A
Can you explain this vulnerability to me?

This vulnerability involves the Linux kernel's cgroup subsystem, specifically how the kernel handles the removal of cgroups (rmdir). A series of changes were made to defer the killing of per-CPU references (css percpu_ref) until the cgroup is fully depopulated, meaning all tasks have left the cgroup.

Previously, the kernel tried to synchronously kill cgroup references while tasks might still be linked to the cgroup, which could cause deadlocks. For example, if the process removing the cgroup was also the reaper of zombie processes, it could block indefinitely waiting for those processes to free, causing a system deadlock.

The fix defers the asynchronous cleanup (css_offline) until after all tasks have left the cgroup, allowing rmdir to return as soon as the cgroup appears empty to userspace, while the actual cleanup happens asynchronously afterward. This avoids the deadlock scenario and aligns kernel behavior with userspace expectations.


How can this vulnerability impact me? :

This vulnerability can cause a system deadlock in Linux environments using cgroups. Specifically, if a process responsible for cleaning up cgroups is also the reaper of zombie processes, the system can hang indefinitely during cgroup removal.

Such a deadlock can lead to system instability, unresponsiveness, or crashes, impacting availability and reliability of services running on the affected Linux system.


How does this vulnerability affect compliance with common standards and regulations (like GDPR, HIPAA)?:

The provided CVE description does not include any information regarding the impact of this vulnerability on compliance with common standards and regulations such as GDPR or HIPAA.


What immediate steps should I take to mitigate this vulnerability?

The vulnerability has been resolved by deferring the css percpu_ref kill on rmdir until the cgroup is fully depopulated, preventing deadlocks caused by waiting on tasks that cannot exit.

Immediate mitigation steps include updating the Linux kernel to a version that includes the fixes described in the chain of commits [1]-[5], which rework the rmdir behavior and defer the css_offline() execution until all tasks have left the cgroup.

Backporting these fixes to stable kernel versions is considered safe and recommended to avoid the deadlock scenario.


Ask Our AI Assistant
Need more information? Ask your question to get an AI reply (Powered by our expertise)
0/70
EPSS Chart