PMIx TOCTOU Race CVE-2023-41915: Upgrade and Harden HPC Clusters

  • Thread Author
A subtle race condition in the OpenPMIx library can allow a local attacker to take ownership of arbitrary files when privileged PMIx code runs as UID 0 — a vulnerability tracked as CVE-2023-41915 that was fixed in PMIx 4.2.6 and 5.0.1 but continues to demand urgent attention from administrators of HPC clusters and any environment that integrates PMIx (for example, Slurm).

Background​

PMIx (Process Management Interface for Exascale) is a lightweight library widely embedded in modern job schedulers, MPI runtimes, and resource managers to coordinate process launch, control, and information exchange in high-performance computing environments. PMIx’s role is largely invisible to end users: it runs inside system daemons and plugins, frequently with elevated privileges, to manage job lifecycle and resource control. Because PMIx often executes in contexts running with root privileges (or under privileged daemons), mistakes in how it manipulates filesystem objects can have outsized consequences for cluster security and stability.
The vulnerability CVE-2023-41915 was disclosed in September 2023. It is a classic time-of-check/time-of-use (TOCTOU) race condition: under certain conditions, PMIx performs filesystem operations that can be influenced by an attacker, allowing that attacker to cause privileged code to change the ownership of an arbitrary file on the filesystem. When those operations are performed while PMIx code is executing with UID 0, the change of ownership can escalate privileges or break critical protections on system files. The issue was reported by François Diakhate (CEA) and addressed in the PMIx 4.2.6 and 5.0.1 releases.

What exactly went wrong: the mechanics of the race​

A TOCTOU in privileged library code​

At its core, CVE-2023-41915 is a concurrency and filesystem-handling bug. The PMIx library, in some code paths, checks or prepares a file or directory and then performs an ownership change (chown). If an attacker can race between those steps — for example by swapping in a symlink to a target file — the privileged code ends up applying chown to the attacker's chosen target rather than the intended object. That means a privileged process can be tricked into assigning ownership of arbitrary files to an attacker-controlled user or group.

Why symlinks and chown matter​

The vulnerability revolves around following filesystem links while changing metadata. Historically, many privileged programs used chown (which follows symlinks) where lchown (which operates on the symlink itself) would be safer. Avoiding follow-on-symlink semantics, adding proper checks, or performing operations on a securely-opened file descriptor are common hardening techniques to prevent symlink races. The upstream fix in PMIx reflects exactly this pattern: do not follow links when doing chown.

A not-remote issue but a critical local risk​

Although the vulnerability requires a local attacker capable of creating or manipulating filesystem entries (i.e., it is not a remote network exploit in itself), the environments where PMIx runs often allow unprivileged users to schedule jobs or write to shared filesystems. In many clusters, job submissions, temporary directories, and shared mounts create realistic attack surfaces for local time-of-check/time-of-use exploitation. When that local attacker can influence an object watched or handled by privileged PMIx code, the risk escalates to full ownership manipulation. Multiple vendor advisories and distributions classified the issue as high-severity because the consequences include privilege escalation and arbitrary file ownership changes.

Who and what is affected​

  • Affected PMIx versions: all releases prior to 4.2.6 in the v4.x series and versions in the 5.0.x series prior to 5.0.1 are considered vulnerable. The upstream release notes explicitly call out 4.2.6 and 5.0.1 as the fixed releases.
  • Common upstream integrators: Slurm and other workload managers frequently link against PMIx or load PMIx plugins; where Slurm or a runtime was built with PMIx support and the cluster runs components as root, the default configuration often presents an exploitable surface. Administrators must therefore check whether their job scheduler was compiled with PMIx and which PMIx ABI version is present. Advisories highlight Slurm-built-with-PMIx as a particularly important use case to verify.
  • Distributions and packages: enterprise and community distributions published patches or advisories following the disclosure (for example, Ubuntu’s security page lists the CVE and fixed package statuses, and Amazon Linux published ALAS). The CVE appears in major vulnerability databases such as NVD and distribution advisories.

Potential impacts: confidentiality, integrity, and availability​

CVE-2023-41915 is primarily an integrity and privilege-escalation risk: by gaining ownership of arbitrary files, an attacker can alter configuration, plant persistence artifacts, or make previously restricted files writable and thereby facilitate further compromise. In a tightly shared HPC environment, ownership changes to scheduler-related files, credential stores, or job-control artifacts could permit lateral movement, unauthorized job control, or persistent access escalation.
Additionally, the vulnerability has availability implications. An attacker who can seize ownership of files used by daemons or jobs can disrupt scheduling, prevent services from correctly starting, or persistently lock out legitimate operations. This kind of manipulation can produce a sustained denial-of-service for compute resources — either while the attacker actively exploits the race or in a persistent state after the change has been made. The CVE’s impact profile and vendor advisories both emphasize the potential for those availability consequences.
Real-world analogies help frame the effect: past race-condition bugs in virtualization and runtime components (for example, symlink/ownership issues in container or hypervisor integrations) have been used to break multi-tenant isolation by allowing an unprivileged actor to reassign permissions on host-level files. Those incidents illustrate how seemingly small TOCTOU bugs can translate into full compromise of shared infrastructure.

Exploitation scenarios and threat models​

  • Privileged-daemon interaction with per-job files
  • Slurm or another scheduler runs a privileged helper that calls PMIx APIs to perform job control or logging on behalf of root-owned daemons. If job-local temporary directories or socket files are world-writable or controlled by job users, those users could race to replace expected objects with symlinks pointing at sensitive host files, then trigger the PMIx path that performs chown. The privileged chown ends up affecting the attacker's chosen target.
  • Shared filesystem attacks on multi-user clusters
  • Clusters often mount shared storage accessible to multiple users. If PMIx code interacts with shared storage while running as root (for instance, to manage logs, sockets, or plugin state), an attacker with write privileges anywhere in the shared namespace can craft a TOCTOU to move ownership of a critical file. This can be used to sabotage job launch, alter scheduled scripts, or prepare for a later privilege escalation step.
  • Chaining to code-execution or persistence
  • Ownership is often the first step. After taking ownership of a config file or script, an attacker may place malicious content that will be executed by a privileged process or used to elevate privileges further. The change in file metadata can also be used to evade detection or cause administrator-side tools to misbehave. The attack chain is straightforward when combined with existing misconfigurations.
Threat actors who would most likely benefit include:
  • Malicious cluster users seeking to break isolation
  • Supply-chain or insider threats seeking persistent access
  • Attackers targeting research clusters to exfiltrate or sabotage compute jobs

Vendor response and fixes​

Upstream openpmix responded by shipping releases that deliberately change the file handling semantics that produced the race. The PMIx 4.2.6 and 5.0.1 release notes cite the specific fix: do not follow links when doing chown, among additional commits that address the unsafe behavior. This indicates a move from a follow-symlink chown to safer semantics (for example, using lchown or operating on opened file descriptors), along with additional hardening elsewhere in the codebase.
Distributions and vendors followed with advisories and patched packages. Amazon Linux released ALAS entries and package updates for affected pmix packages; Ubuntu and other distributions documented the CVE and provided fixes for supported releases. The NVD and vendor advisories list the CVE and link to the upstream fixes and package-level patches. Administrators should prioritize coordinated updates that include both PMIx and any dependent packages (notably Slurm builds that bundle or link to PMIx) because ABI mismatches or rebuild requirements can complicate remediation.

Recommended remediation and mitigations​

Effective remediation requires both upgrading and validating your runtime environment. Follow these prioritized steps:
  • Inventory
  • Confirm whether your scheduler and compute stack were built with PMIx support. For Slurm, run the runtime command to list MPI/PMIx plugin support (for example, the srun --mpi=list check suggested in advisories) and document which PMIx plugin and version are present. This tells you whether your nodes are susceptible.
  • Upgrade PMIx to a fixed release
  • Upgrade to OpenPMIx v4.2.6 or v5.0.1 (or later) as published by the project. If your distribution already shipped a patched pmix package, prefer the vendor package to avoid rebuilds unless you manage your own custom RPMs/DEBs. Note any ABI changes: some upgrades may require rebuilding or relinking client components or Slurm itself.
  • If you cannot upgrade immediately, apply mitigations
  • Disable PMIx support in your scheduler or remove the PMIx plugin libraries from compute nodes. Adjust configuration (e.g., MpiDefault in Slurm) to avoid loading pmix plugins until you can upgrade. OSG and other advisories explicitly describe removing mpi_pmix*.so from compute nodes as a temporary containment.
  • As a source-level mitigation, several advisories and community posts recommended replacing usages of chown with lchown or otherwise avoiding following links during privileged metadata changes, rebuilding packages with that replacement if you maintain your own builds. This requires developer involvement and careful testing.
  • Rebuild and restart services as needed
  • If you upgrade a shared PMIx library, restart slurmd and any privileged daemons so they load the patched library. If PMIx was statically linked into a scheduler package or custom plugin, rebuild that component against the patched PMIx and redeploy. Failure to restart services or rebuild linked components leaves the cluster exposed even after packages are upgraded.
  • Harden filesystem permissions and job isolation
  • Review mount options, temporary directory usage, and per-job directories. Prefer mounting shared filesystems with options that reduce symlink manipulation risk where possible, and ensure jobs run in per-user directories with restrictive permissions. Consider configuring stricter enforcement of job isolation in the scheduler.
  • Detection and forensics
  • Search for unexpected ownership changes and for indicators specific to PMIx-related paths or the scheduler’s temporary files. Audit logs for restarts, unusual file writes, or modifications to privileged configurations. If you suspect compromise, follow incident response playbooks that include credential rotation and forensic preservation of impacted nodes.

Practical considerations and operational friction​

  • ABI breakage and rebuilds: Some PMIx fixes introduce ABI-level changes between point releases in the 4.x series. Administrators must be prepared to rebuild Slurm or other components that link with PMIx if they upgrade the library to a release with breaking changes — this can add operational overhead in large clusters. The OSG advisory explicitly warns about ABI-breaking changes between 4.2.3 and 4.2.6 and the consequent need to rebuild Slurm packages. Plan maintenance windows accordingly.
  • Trade-offs of temporary mitigations: Removing PMIx plugins or disabling support reduces attack surface but also removes functionality. Clusters that rely on PMIx for job coordination, MPI integration, or advanced job control will see degraded functionality until a proper upgrade is applied. Administrators must weigh the availability vs. security trade-offs when applying short-term mitigations.
  • Local vs. remote exploitation: While the flaw requires local write or symlink capabilities, many shared compute environments permit such local write access by design (users can upload job scripts and temporary files). The practical exploitability is therefore environment-dependent but often realistic in multi-user HPC contexts. Use of shared ephemeral storage, scrubbing of job directories, and strict per-user isolation reduce risk.

Detection guidance and indicators of compromise​

  • File ownership changes: Monitor for unexpected chown events or ownership changes on sensitive files (configuration files, scheduler control sockets, privileged scripts). The pattern of a privileged process changing ownership on files residing outside normal job paths is suspicious.
  • Unusual restarts or errors in job control: Privileged daemons that interact with PMIx may crash or log unusual errors if an attacker manipulates linked resources. Inspect systemd or daemon logs around slurmd or pmix-related components.
  • Tampered scheduler files or job scripts: Search for unauthorized modifications to scheduler-level scripts, prolog/epilog scripts, or shared scheduler libraries. An ownership change may be followed by content modifications aimed at persistence or backdoors.
  • Forensics: Preserve filesystem images and logs before remediation when compromise is suspected. Because the vulnerability enables ownership changes, it can be used to cover tracks; fast containment and careful evidence capture are crucial.

What this tells us about supply chain and shared-component risk​

PMIx is a foundational library that appears across multiple components in the HPC stack. This CVE is a useful case study in the systemic risk introduced by shared libraries that run privileged operations: a single insecure library function can ripple through schedulers, plugins, and runtimes to expose entire clusters.
Two practical lessons emerge:
  • Treat infrastructure libraries as first-class security assets. They deserve the same patch-testing and inventory management as operating-system packages.
  • Assume local, low-privilege users are an adversary model in shared compute environments. Designs and builds should assume users will attempt to manipulate filesystem artifacts in creative ways; secure APIs and careful use of lchown/open-at semantics reduce these attack surfaces.
Comparisons with other TOCTOU bugs in virtualization and system utilities show a recurring pattern: when privileged code interacts with user-writable namespaces without atomic, race-free operations, attackers can rapidly escalate from local footholds to host compromise. Maintaining strict separation and adopting hardened file-operation primitives are long-term mitigations that pay dividends across platforms.

Timeline and disclosure notes​

  • Discovery and disclosure: The issue was reported in September 2023 and fixed in PMIx 4.2.6 and 5.0.1 upstream releases. Distribution vendors and cloud OS teams rolled out patches and advisories in the weeks that followed. NVD and vendor advisories catalog the CVE and provide fixed-package information.
  • Response completeness: Upstream changes addressed the root cause by avoiding following links in privileged chown operations and by tightening related code paths. Distributions have issued patches for affected releases, and vendors recommended rebuilds or removals in place where an immediate upgrade was infeasible. Administrators should verify both the PMIx package and any consumers (Slurm, MPI stacks) that might include or link the vulnerable code.

Checklist for administrators (actionable, prioritized)​

  • Inventory PMIx usage in your stack (check Slurm and MPI plugins).
  • If PMIx is present, schedule immediate upgrades to PMIx >= 4.2.6 or >= 5.0.1, testing for ABI compatibility first.
  • If upgrade cannot be done immediately: remove or disable PMIx plugins (e.g., remove mpi_pmix*.so) and adjust scheduler config as a stopgap.
  • Rebuild any statically linked components against patched PMIx and restart privileged daemons.
  • Harden job directories and shared mounts; enforce stricter permissions and mount options where feasible.
  • Monitor for ownership changes, anomalous chown calls, and suspicious scheduler logs; capture forensics if compromise is suspected.

Final analysis: strengths and residual risks​

The upstream fix is precise and effective: changing chown semantics to avoid following links directly targets the root cause and is a standard, correct hardening step. Vendors and distributions have issued patches and mitigations, and the public advisory ecosystem provides clear upgrade guidance. These are notable strengths: rapid upstream remediation and clear distribution advisories reduce long-term exposure.
However, residual risks remain:
  • Environments that cannot immediately upgrade — whether due to ABI compatibility, custom builds, or operational constraints — face a nontrivial exposure window. Temporary mitigations can be effective, but they come with function loss or operational complexity.
  • The vulnerability’s exploitability in the wild depends on local configuration and user privileges, but realistic cluster configurations frequently provide sufficient local control to mount the attack. That means many production HPC environments are at meaningful risk until remediation is complete.
  • Detection is imperfect: ownership changes can be subtle and may be used to prepare longer attack chains that evade cursory scans. Proactive monitoring and incident readiness remain important even after patches are applied.

Conclusion​

CVE-2023-41915 is a textbook example of how small filesystem-handling mistakes in widely used, privileged libraries can produce outsized risks. For operators of shared compute resources, the practical combination of PMIx running within privileged daemons and the presence of user-writable namespaces makes this vulnerability particularly relevant. The good news is that upstream fixes and distribution updates exist — PMIx 4.2.6 and 5.0.1 close the TOCTOU; the hard work now lies with site operators to inventory, upgrade, rebuild where necessary, and apply mitigations until full remediation is confirmed. Prioritize the inventory and patch steps in the checklist above, hardened monitoring, and a cautious approach to temporary workarounds that preserve cluster integrity while you patch. Doing so will reduce the risk of privilege escalation, service disruption, and the potentially persistent availability issues that follow an ownership-based compromise.

Source: MSRC Security Update Guide - Microsoft Security Response Center