Linux perf Hang Fix CVE-2025-37747 Restores Safe Teardown

  • Thread Author
Linux kernel maintainers closed a subtle but impactful race in the perf subsystem that could cause hosts to hang while freeing a sigtrap event, a denial‑of‑service condition tracked as CVE‑2025‑37747 and now fixed upstream and in many vendor kernels.

Patch notes board shows Upstream Fix and Corrected Teardown Flow beside interlocking gears.Background​

The Linux perf subsystem is the kernel’s primary performance‑monitoring and sampling infrastructure. It wires together interrupt handlers, per‑task bookkeeping, and userland file descriptors created by perf_event_open so users and tools can record profiling samples, hardware counters, stack traces, and signal‑driven events (sigtrap). Because perf crosses interrupt, signal, and file‑descriptor lifecycles, it must carefully coordinate references, task work callbacks, and teardown sequences to avoid races that affect availability.
The specific problem resolved in CVE‑2025‑37747 arises when a deferred signal related to a sigtrap event has not yet been delivered before the file backing the event is closed. Under particular interleavings of IRQs, signal delivery, and file release callbacks, the kernel could block waiting on task work cancellation (rcuwait_wait_event, producing a hang rather than a clean teardown. This behavior was documented and triaged by upstream and by distribution security teams.

What happened (technical overview)​

At a high level, the hang is a lifecycle and ordering bug rather than a memory‑corruption vulnerability. The observed sequence of events in affected kernels looks like this:
  • perf_event_overflow queues a deferred callback (task work) to handle a pending sigtrap for a target task.
  • The file descriptor that represents the perf event is closed (fput / __fput, which enqueues additional perf teardown work.
  • As task_work_run executes on the current task, perf removes the global list of pending callbacks from the task_struct; from that point a concurrent task_work_cancel call can no longer remove pending but not‑yet‑started work items.
  • A later wait operation (rcuwait_wait_event expects task_work_cancel to prevent pending callbacks from running, but because the list is already removed and cancellation fails, rcuwait_wait_event blocks indefinitely — producing the hang.
The canonical stack traces and the explanatory notes that went into the CVE description reproduce this choke point and show how inverted dependencies between the event owner and the deferred callbacks lead to the deadlock.

Why this is availability‑first (not code execution)​

Every major tracker classifies CVE‑2025‑37747 as an availability issue. The bug produces a wait/hang in kernel context; it does not depend on arbitrary memory corruption or control‑flow hijacks. Public advisories and vulnerability databases therefore treat it as a Denial‑of‑Service (DoS) or hang that can be triggered by a local actor or by automated tooling that exercises perf flow control. There is no public evidence that this defect by itself leads to remote code execution or privilege escalation.

Upstream root cause and the fix​

The upstream rationale and the pragmatic fix are instructive for kernel engineers and operators alike.

Root cause details​

The deadlock is rooted in a mismatch between how task work lists are managed and how perf release paths assume they can cancel pending callbacks. Specifically:
  • When the runtime removes the pending callback list from the task_struct for execution, cancel code paths lose the ability to find and remove pending work items.
  • Some perf release code paths assumed they could cancel the outstanding task work and then synchronously wait for completion; those assumptions fail when the cancellation window is missed and the synchronizing wait blocks indefinitely.
  • The dependency inversion becomes visible when two tasks or IRQ sequences interleave in the precise order needed to remove the cancellation handle while another path expects it to be removable.
Upstream discussion and the published CVE note that reverting to the old strategy — acquire the event reference when queueing the perf task work and release it from the task work itself — is the safe option, restoring a lifetime guarantee for the event until the deferred callback completes. However, that approach must be reconciled with earlier fixes (for example, fixes that removed event leaks) so the maintainers crafted a targeted rework that keeps the lifecycle balanced without resurrecting memory leaks.

What the fix does (in plain terms)​

  • The patched code ensures the perf event object retains a reference count from the moment the task work is queued until the task work runs and explicitly releases that reference.
  • The change reorders or extends reference semantics so a pending deferred callback can always safely access the event and then release the reference once its work is done.
  • Additional care is taken in parent/child event cases so that a child does not try to access a parent that has already been released; the fix sequences parent release to occur last when needed.
This is a small, surgical change in the event lifecycle logic and follows a common kernel pattern: if asynchronous work may outlive the path that created it, the object must carry a reference that the worker drops when the asynchronous work completes.

Who is affected and how vendors responded​

Not all Linux builds are equally exposed. The issue is limited to kernels that include the affected perf codepaths and that had the problematic ordering in the release/teardown logic. Major distributions and vendor trackers assessed the risk and rolled fixes into their kernel packages.
  • SUSE marked the issue as Resolved and assigned a CVSS base score consistent with an availability impact; SUSE advisories list multiple kernel packages that include the fix.
  • Amazon’s ALAS tracker lists the CVE and maps it to affected Amazon Linux kernels; some ALAS entries show fixed package versions for Amazon Linux 2023 and newer kernels.
  • Red Hat and other enterprise vendors included the fix in their kernel updates or documented package mappings in their advisories and errata feeds (various vendor advisories were synchronized with the upstream stable commits). Public vulnerability feeds such as OSV and distributions’ trackers aggregate those vendor mappings.
Important operational caveat: embedded devices, vendor kernel forks, and long‑life appliance images may lag upstream and thus remain vulnerable longer than mainstream distributions. Operators of appliances, routers, or appliance VMs should verify vendor‑supplied images directly.

Practical remediation — what operators must do now​

The definitive remediation is to run a kernel that contains the upstream fix and to reboot into it. Kernel fixes that change in‑memory control paths and reference lifecycles require a kernel reload.
Follow this prioritized runbook:
  • Inventory
  • Use uname -r and distribution package queries (apt, rpm, zypper, etc. to identify running kernel versions and currently installed kernel packages.
  • Map kernel versions to vendor advisories to determine whether your installed package contains the fix. Vendor changelogs usually reference the upstream commit or the CVE identifier.
  • Patch path
  • Install the vendor kernel package that contains CVE‑2025‑37747 backport or upgrade to a kernel release that includes the upstream stable commit.
  • Where vendors provide livepatch kernels that include the fix, evaluate them for your environment and apply after appropriate testing. If livepatch is not available, schedule a reboot into the patched kernel.
  • Test and validate
  • Pilot the patched kernel in a representative test group. Exercise perf workloads, sample-based profiling, and any in‑house agents that use perf or sigtrap events.
  • Verify there are no regressions in performance tooling and confirm that the kernel booted with the intended package version.
  • Deploy
  • Roll the patch out in staged waves with monitoring (logs, service health checks).
  • Reboot hosts and confirm the new kernel is active.
  • Post‑deployment checks
  • Monitor kernel logs (dmesg / journalctl -k) for residual perf subsystem warnings or unexpected oops traces.
  • On hosts that cannot be rebooted immediately (embedded appliances), contact vendors for backports or plan image upgrades.
For cloud fleets and multi‑tenant infrastructure, treat these updates as high priority. Hosts that accept untrusted code or that run containerized workloads where unprivileged actors can influence perf operations are higher risk.

Temporary mitigations if you cannot patch immediately​

If a patch or reboot cannot be scheduled quickly, administrators can reduce exposure using conservative controls:
  • Restrict perf capabilities:
  • Limit the ability to create perf events to trusted users. On many systems, perf_event_open requires CAP_SYS_ADMIN or similar privileges — tighten capability assignment where feasible.
  • Harden container runtimes:
  • Ensure container runtimes do not grant CAP_SYS_ADMIN, CAP_BPF, or other capabilities that enable perf-like interactions to untrusted workloads.
  • Isolate high‑risk systems:
  • Move development/test hosts or CI runners that allow frequent and untrusted perf activity to isolated networks or out of production pools until patched.
  • Monitor and alert:
  • Add SIEM rules for kernel log patterns that indicate stuck task_work or rcuwait wait events, and raise alerts on repeated perf subsystem warnings.
These mitigations reduce attack surface but do not replace the kernel update. The underlying logic bug remains and only a patched kernel eliminates the hang vector.

Detection, hunting, and incident response​

Hunting for attempts to trigger CVE‑2025‑37747 or to detect a hang in progress is straightforward with kernel log telemetry and host health checks.
  • Detection signals to hunt
  • Kernel log messages and oops traces that reference perf_release, perf_event_release_kernel, _free_event, perf_pending_task_sync, task_work_cancel failures, or rcuwait_wait_event.
  • Long‑running kernel waits or stalled user processes where perf file descriptors were recently closed.
  • Monitoring of systemd‑service hangs, stuck fds, or elevated context switch/wait rates correlated with profiling workloads.
  • Incident playbook (short)
  • Capture kernel logs (dmesg and journalctl -k) and preserve the exact stack traces.
  • Identify recent perf-related activity on the host (audit logs, process accounting).
  • Temporarily isolate the host if it provides multi‑tenant services or if the hang impacts other workloads.
  • Reproduce in a lab if safe — but do not run reproducers on production arrays or critical systems without backups.
  • Apply the patched kernel and validate that the hang does not recur.
Detection is greatly aided by good telemetry and by correlating perf‑tooling deployment with kernel package versions. Preserve forensic artifacts if you suspect misuse or deliberate triggering.

Broader technical context and critical analysis​

This CVE highlights recurring engineering and operational patterns that are useful to call out for maintainers and operators.

Strengths in the remediation approach​

  • The upstream fix is small and surgical: it preserves the existing semantics of perf while restoring a safe reference lifecycle for asynchronous callbacks. Small, targeted patches reduce regression risk and are easier for vendors to backport to stable trees.
  • Multiple vendors and distribution trackers quickly incorporated the fix into stable packages, giving administrators a clear operational path: update and reboot.

Residual risks and vendor lag​

  • Vendor backport timelines vary. Embedded vendors, appliance maintainers, and custom kernel forks typically lag upstream. Those images can remain vulnerable for longer periods and often require direct vendor engagement.
  • Complex subsystems produce brittle assumptions. Perf touches interrupts, signals, file descriptors, and cross‑task work queues. Small changes in initialization order or list management can surface new inverted dependency classes in unexpected places; the perf sigtrap hang is a good example of such brittle interdependencies.

On exploitability and escalation​

  • The recorded evidence and vendor advisories frame this CVE as an availability defect. There is no authoritative public proof‑of‑concept demonstrating that the hang can be chained to privilege escalation or code execution.
  • That said, local DoS primitives can be weaponized in multi‑stage attacks if an attacker already controls code on a host, or if they can repeatedly cause a hang to force operator action. Treat the absence of a PoC as provisional; maintain a conservative posture for critical, multi‑tenant infra.

Developer and kernel‑engineering takeaways​

For kernel and subsystem authors, the incident reinforces engineering patterns that reduce future regressions:
  • Prefer explicit reference counting for objects handed to asynchronous work items; the worker should own and release references for any object it expects to access.
  • Avoid wide, implicit assumptions about the cancellability of work lists; explicitly synchronize cancellation and execution paths.
  • Add concurrency unit tests that emulate interleaved IRQ and task_work scenarios — such tests are more complex than typical unit tests but catch lifecycle races earlier.
  • When changing perf or signal handling, increase pilot testing on representative multi‑tenant workloads where the race windows are more likely to surface.
These are not new lessons, but CVE‑2025‑37747 is a concrete case where failing to follow them produced visible operational pain.

Conclusion and recommendations​

CVE‑2025‑37747 is a reliability‑centric kernel bug in the perf subsystem that can cause hosts to hang during sigtrap event teardown. It is not an exploitation primitive for remote code execution, but it is operationally significant — especially for multi‑tenant, cloud, and CI infrastructures where untrusted workloads or automated profiling jobs are common. Upstream provided a small, correct fix that restores safe lifecycle handling for pending task work; vendors have incorporated those commits into stable kernel packages and advisories. Action checklist (concise)
  • Inventory kernels and perf usage today.
  • Install vendor kernel updates that reference CVE‑2025‑37747 or the upstream commits and reboot into the patched kernel.
  • If you cannot patch immediately, restrict perf capabilities and container privileges, and isolate high‑risk hosts.
  • Add monitoring for perf release and task_work related kernel traces and preserve logs for any observed incidents.
Operators and maintainers who follow this playbook will remove the hang vector and restore predictable perf behavior across their fleets. The incident is a reminder that asynchronous callbacks, signals, and teardown code must be composed with explicit lifetime and cancellation guarantees, and that small, surgical fixes coupled with disciplined backporting remain the most reliable way to protect large, diverse Linux deployments.

Source: MSRC Security Update Guide - Microsoft Security Response Center
 

Back
Top