Linux ATM Driver Race Fixed: Hold atm_dev_mutex During Procfs Cleanup

  • Thread Author
A subtle race in the Linux kernel’s ATM driver that left a small window where procfs entries could be double-registered has been cataloged as CVE-2025-38245 and fixed upstream with a small but important ordering change: make sure the atm_dev_mutex is held until procfs/sysfs entries are fully removed during device deregistration.

Neon padlock labeled 'atm_dev_mutex' guards glowing ATM protocol panels.Background / Overview​

The problem reported as CVE-2025-38245 lives in the kernel’s ATM (Asynchronous Transfer Mode) subsystem. A syzbot-driven trace revealed that a sequence of operations in atm device registration/deregistration could race, producing a kernel warning like “proc_dir_entry 'atm/atmtcp:0' already registered” and, in some contexts, triggering kernel instability. The root cause was an ordering/locking omission: the code that removes an ATM device released atm_dev_mutex too early — before procfs and sysfs entries associated with that device were torn down — leaving a transient state where the device is no longer visible to the device-list lookup but its procfs artifacts still exist. Holding the mutex until procfs/sysfs cleanup completes eliminates the race window. Multiple independent trackers and kernel-stable maintainers recorded the change and its intent; the patch was merged into several stable trees (for example into 6.6-, 6.12-, 6.1- and 5.15-stable queues), and vendors published advisories mapping the upstream commit into distribution kernel packages.

The technical anatomy: what actually went wrong​

The workflow that exposed the race​

  • On device creation, the ATM driver uses atm_dev_mutex while it:
  • Looks up whether a device with the same identity already exists (via __atm_dev_lookup.
  • Creates device state and registers procfs/sysfs entries for that device.
  • On device teardown, atm_dev_deregister removes the device from the internal device list and then (previously) released atm_dev_mutex immediately after removing the list entry.
  • Because procfs/sysfs teardown could happen asynchronously or be ordered afterwards, there was a small window between the removal from the device list and the removal of procfs/sysfs entries in which:
  • A concurrent atm_dev_register could not find the device in the list (it was just removed) and therefore proceed to create a new device and attempt to re-register procfs/sysfs entries that are still present from the previous device.
  • This leads to procfs double-registration warnings and can cause kernel warnings or oopses when procfs registration code detects an existing entry.

The concrete symptom and traceroute evidence​

syzbot produced a KASAN/log trace showing the immediate symptom: a proc_register failure and a kernel stack trace pointing to atm_proc_dev_register → atm_dev_register → drivers/net/atm paths. The error text reproduced in multiple trackers includes the exact procfs duplicate registration complaint and the stack that led to the detection; that reproducible trace is the smoking gun maintainers used to reason about the fix.

The patch and why it’s safe​

The upstream fix is intentionally small and surgical: keep atm_dev_mutex held until procfs/sysfs entries have been removed during atm_dev_deregister. Doing so restores the invariant the registration path relies on — namely, that list lookups and the existence/absence of procfs/sysfs entries are observed under the same lock — and thereby removes the narrow time-of-check/time-of-use (TOCTOU) window that allowed the duplicate registration. The change is low-regression by design: it doesn’t alter high-level semantics of the ATM subsystem, only the ordering of cleanup and lock release. The patch was accepted into stable kernel queues and backports were made to multiple maintained branches.

Scope, severity, and exploitability​

Affected component and typical environments​

  • Affected component: in-tree Linux kernel ATM subsystem — code paths responsible for ATM device registration and procfs/sysfs management.
  • Typical exposure: local. The attack vector requires processes capable of invoking ATM device registration/deregistration operations or triggering the atmtcp/atm device lifecycle (for example, via ioctl paths or local device nodes). On normal desktop installs this area is rarely exercised; on specialized systems, embedded images, or multi-tenant hosts that expose device nodes to less trusted code, the practical exposure is greater.

Impact model​

  • Primary impact: availability — kernel warnings, oopses, and possible panics leading to service interruption or reboots.
  • Secondary impact: integrity — while the publicly documented symptom is double-registration and warnings, the underlying class of bug (lifetime/race conditions in kernel space) can theoretically be escalated into memory-corruption primitives in skilled, targeted hands. At disclosure there was no widely published proof-of-concept that converts this particular race into a privilege-escalation exploit. Nevertheless, kernel UAFs and race defects have historically been significant building blocks for complex exploit chains.

CVSS and vendor ratings — a note on divergence​

Different trackers reported differing severity numbers. Some vendor trackers and advisory systems classify the issue as Medium with CVSS v3 scores reported between the mid-5 range (5.5) and higher (some mirrors reported 7.8). These discrepancies reflect different scoring contexts and whether vendors considered confidentiality/integrity impacts (some vendors scored availability-only impacts lower). Operators should treat vendor advisories and the mapping to their own exposure model as authoritative for prioritization. Flagging the precise score variance is important because it illustrates how a single technical fix can be assessed differently depending on operational context.

Timeline and upstream response​

  • syzbot (kernel fuzzing infrastructure) flagged the procfs duplicate registration and produced a clear warning/trace that pointed to atm_proc_dev_register/atm_dev_register as the path producing the issue.
  • An upstream patch was authored to hold atm_dev_mutex longer in atm_dev_deregister, preventing the race window. The upstream commit referenced in the stable-queue notes is a433791aeaea6e84df709e0b9584b9bbe040cd1c (and related stable-queue commit IDs appear in the stable trees).
  • The patch was added to multiple stable maintenance trees (6.6, 6.12, 6.1, 5.15, etc. so distributions could pick up the fix for different kernel series. The stable-queue announcements and patch batches document these merges and backport placements.
  • Public CVE entries and vendor advisories (Ubuntu, Oracle, AWS ALAS, Red Hat trackers and other mirrors) recorded CVE-2025-38245 and mapped it to upstream stable commits and distribution packages. These advisories provide the vendor-specific mapping administrators need to plan updates.

How to detect if your systems were (or are) vulnerable​

  • Kernel logs: search dmesg and journalctl -k for the exact procfs registration warning:
  • Example string to grep for: proc_dir_entry 'atm/atmtcp:0' already registered
  • Recurrent messages around ATM driver functions (atm_proc_dev_register, atm_dev_register, drivers/atm/*) are a clear sign the condition occurred.
  • Crash traces: kernel oops logs that reference proc_register and call paths into fs/proc/generic.c alongside ATM stack frames indicate the race manifested on the host. Preserve vmcore images (kdump) where possible for offline analysis.
  • Package mapping: consult your distribution’s kernel changelog or security advisory for references to CVE-2025-38245 or to the upstream commit IDs noted in the stable-queue entries. Because vendors often backport the single fix into their kernels rather than bumping the kernel version, the package changelog is the authoritative place to verify whether your kernel includes the fix.

Recommended mitigation and remediation steps​

  • Definitive remediation: install vendor-supplied kernel packages that include the upstream fix and reboot into the patched kernel. Check your distribution security advisory to determine which kernel package version contains the backport for your release. Ubuntu and many major distributions published advisories mapping the fix to package updates.
  • If a vendor patch is not yet available and an immediate mitigation is required:
  • Consider unloading ATM-related kernel modules or preventing untrusted users from interacting with ATM device nodes where practical. Removing module access reduces the surface that can trigger this registration flow.
  • Restrict access to device nodes and tighten container isolation so that unprivileged tenants cannot trigger device registration/deregistration routines. This is especially relevant on multi-tenant hosts or CI runners.
  • For environments that build and maintain custom kernels:
  • Review the upstream commit (upstream commit id a433791aeaea6e84df709e0b9584b9bbe040cd1c as referenced in the stable-queue summaries) and apply the same ordering change in your kernel tree, then rebuild and deploy. The patch is minimal and designed to be backportable.

Operational recommendations for enterprise admins and cloud operators​

  • Prioritize hosts where untrusted local code runs: CI/CD runners, developer workstations that accept untrusted code, container hosts that expose device nodes to containers, edge/embedded devices with long patch lifecycles. These environments present the highest practical risk because the attack vector is local.
  • Treat kernel stability defects as high-priority for multi-tenant infrastructure: even if the vulnerability is classified as “availability-only,” a single host crash in a hypervisor or shared node can cascade and affect many tenants and orchestrated services. Ensure you schedule kernel updates and reboots in maintenance windows.
  • Detection posture: add log-hunting rules that alert on the procfs duplicate-registration message and on ATM driver oops traces. Correlate suspicious netlink or device-creation activity around the time of traces to help triage root cause.

Why the fix is the right trade-off​

  • The corrective action is strictly ordering/locking: hold the existing mutex longer so that two related operations (list lookup and procfs teardown) are observed atomically with respect to each other. This follows a conservative kernel-maintainers’ principle: prefer minimal, local changes that restore invariants rather than broad redesigns.
  • Low-risk: the patch does not alter public interfaces or long-running semantics; it only changes lock hold time during teardown, which is simple to reason about and easy to backport to stable trees — hence its inclusion in multiple stable branches soon after upstream acceptance.

Risks, caveats, and what’s not certain​

  • No public PoC (proof-of-concept) for privilege escalation at disclosure: public advisories and tracker descriptions focus on warnings, KASAN traces, oopses and availability impact. There is no authoritative public report that this specific defect resulted in remote privilege escalation or a stable exploit chain at the time of public disclosure. Nevertheless, kernel race and lifetime bugs are high-value primitives that, when combined with allocator- or timing-control techniques, can be shaped into more serious primitives — so absence of a PoC is not the same as absence of risk. Treat the escalation possibility as theoretical but plausible for targeted attackers with local footholds.
  • Score variance between vendors: you will see different CVSS or priority numbers across trackers. Use your organization’s risk model (assets impacted, multi-tenancy, guest isolation, uptime requirements) to prioritize remediation rather than relying on a single numeric grade. ﹘ some vendors placed the base score in the 5.x range while mirrors that incorporate broader potential impacts reported higher scores.
  • Mapping upstream commits to vendor packages requires diligence: because kernel maintainers and distribution packagers often backport fixes, the kernel version string alone is not definitive proof of presence or absence of a fix; check package changelogs for the referenced stable-queue commit or explicit CVE mention.

Practical checklist for admins (quick-action)​

  • Check your running kernel: uname -r.
  • Search kernel logs for procfs duplicate-registration warning and related ATM stack traces.
  • Consult your distribution’s security advisory for CVE-2025-38245 and identify the fixed kernel package for your release (Ubuntu, Oracle Linux, Red Hat, Amazon Linux / ALAS entries have public advisories).
  • Schedule patch deployment and a controlled reboot for impacted nodes.
  • If no vendor patch is available and the host accepts untrusted local code, consider restricting access to ATM interfaces or unloading ATM modules where practical until patched.

Conclusion​

CVE-2025-38245 is a representative example of how small ordering or locking errors in kernel code can create narrow but real races with outsized operational impact. The good news is that the fix is small, pragmatic, and has been integrated into stable kernel trees and vendor advisories — making remediation straightforward for administrators who apply vendor kernel updates and follow package changelogs. The event is also a reminder: automated fuzzing tools such as syzbot remain essential for revealing subtle concurrency bugs that evade typical testing. Organizations running multi-tenant or device-exposed workloads should prioritize these kernel updates and add simple log-detection rules for the procfs duplicate-registration warning to their monitoring playbooks.
Source: MSRC Security Update Guide - Microsoft Security Response Center
 

Back
Top