A critical local privilege‑escalation bug in Ceph’s crash‑handling service — tracked as
CVE‑2022‑3650 — lets an attacker with low privileges escalate to root by abusing the cluster crash‑dump path, and operators must treat it as a high‑impact, operational risk until patched. Multiple downstream vendors and distribution vendors have catalogued the issue, upstream Ceph issued fixes and backports, and major distributions published security updates and mitigation guidance.
Background / Overview
Ceph is a widely deployed open‑source distributed storage system that provides object, block and file interfaces across large clusters. To aid diagnostics, Ceph includes a crash collection facility; the
ceph‑crash.service and the accompanying ceph‑crash script watch a crash directory (typically /var/lib/ceph/crash) and post crash reports back into Ceph’s telemetry via the
ceph crash post workflow.
CVE‑2022‑3650 stems from how that crash ingestion daemon operates with filesystem objects that are, in many deployments, writable by an unprivileged local account (the ceph user). When the privileged service processes attacker‑controlled crash artifacts, it can be coerced into performing privileged operations on arbitrary files or into leaking privileged data. Ceph’s own vulnerability list and distribution advisories summarize the issue as a local privilege escalation through ceph‑crash.service.
What the vulnerability actually is
The root cause, in plain terms
At its core the defect is a
privilege separation and file‑ownership design/implementation gap: a component running with elevated privileges periodically scans a directory that can be written by a lower‑privileged user. That combination creates a classic local privilege escalation and information‑disclosure opportunity.
- The ceph‑crash program runs with system privileges and inspects the crash directory.
- That directory (or contents inside it) can be created or influenced by the unprivileged ceph account or other local processes.
- By crafting files or symlink structures, a local adversary can manipulate what the privileged process reads, writes, or posts — effectively causing privileged actions to operate on attacker‑chosen targets.
- The resulting effects include escalation to root, exfiltration of privileged files, and denial‑of‑service by weaponizing crash posts or resource exhaustion.
The issue has been classed under CWE‑842 (Placement of User into Incorrect Group) in vendor advisories and given a CVSS v3.1 base severity around
7.8 (High) by multiple sources. Some vendor scoring variations treat the scope differently and therefore produce slightly different numerical ratings, but the consensus is that this is a high‑priority local elevation of privilege.
How realistic is exploitation?
Exploitation requires a local foothold: an attacker must be able to write to the Ceph crash directory or influence files that ceph‑crash will process. That may sound like a high bar, but in real deployments there are multiple vectors that lower this barrier:
- Multi‑tenant storage nodes that run customer workloads on the same host can expose the ceph user and crash directory to tenants.
- Misconfigured containers, poorly isolated system agents, or automated jobs that create files under /var/lib/ceph/crash can provide an attacker with the required write access.
- Supply‑chain or post‑compromise scenarios (where a non‑privileged account has been compromised) make the path trivial.
Because of these real‑world deployment patterns, several distributions and vendors have treated the bug as urgent and pushed fixes and advisories.
Timeline and vendor response
- The issue was publicly disclosed in late 2022 and assigned CVE‑2022‑3650; vendors began publishing advisories and patches in early 2023. Upstream Ceph documented the CVE in its historical CVE list and implemented fixes and backports in the relevant branches.
- Major Linux distributions (Ubuntu, Debian, SUSE) produced security notices with fixed package versions; Ubuntu’s security record lists the fix included in Ceph 17.2.6 (and earlier backports for some series). Debian and other distros applied backports or security updates for the affected releases.
- Enterprise vendors that bundle Ceph — including IBM Storage Ceph — published vendor‑specific bulletins and recommended upgrading to fixed appliance builds or supported releases.
The practical effect: upstream did deliver code changes and backports, and downstream maintainers produced packages and advisories — but the window between disclosure and wide patching is where risk concentrates, especially for appliance or vendor‑managed installations that update on a slower cadence.
Technical analysis of the fixes
Upstream changes reduce the attack surface by tightening how the crash ingestion path handles files and by adjusting the privileges and ownership semantics of the crash‑directory workflow.
- The upstream Ceph commits and pull requests referenced by distribution trackers show targeted fixes to the crash‑watcher logic and careful ownership/permission handling for files the daemon processes. Distributions pointed to these upstream commits when rolling out the patches.
- Fixed releases include explicit backports for older Ceph branches (Pacific, Quincy, and release 17.x lines) used in production. This means operators should prioritize the vendor or distro release that contains the backport appropriate for their installed Ceph branch.
Two practical observations about the code changes:
- They do not attempt to remove crash collection functionality — that would break diagnostics — but instead constrain the privileged component to avoid acting on untrusted paths or attacker‑controlled files.
- Backports are available for supported branches; appliances or managed solutions often required vendor releases to consume the fixes rather than raw upstream packages.
Impact: confidentiality, integrity, availability
CVE‑2022‑3650 is not merely an academic escalation; it carries real operational consequences across the security triad:
- Confidentiality: An attacker who achieves privilege escalation can read system‑level secrets and Ceph internal state; the vulnerable path specifically permits “dumping privileged information” via crafted crash artifacts.
- Integrity: With root privileges, an attacker can modify Ceph state, configuration, or stored objects and potentially manipulate cluster behavior. The crash posting mechanism could be abused to insert crafted objects that poison diagnostics.
- Availability: Vendors and advisories explicitly call out availability impact: the attacker can post large or malformed crash payloads or pressure the crash ingestion path to cause resource exhaustion or crash loops, leading to denial‑of‑service conditions that may be sustained while an attack continues or persistent if the service fails to recover. That operational availability impact makes the vulnerability relevant to service‑level and continuity planning.
In short: local exploitation leads to root; once root is obtained, all three security properties are at severe risk — and availability impacts are not hypothetical.
Who is affected (deployments and product scope)
- Any cluster running Ceph releases that include the ceph‑crash service and that have not applied the upstream or distro patches is potentially affected. Distributions have provided lists of fixed package versions and backports for several release lines. Debian, Ubuntu, SUSE and other distributions list both vulnerable and fixed package versions in their trackers.
- Appliance and vendor bundles that incorporate Ceph (for example, IBM Storage Ceph builds) may be affected and typically require a vendor release or firmware update to safely remediate; vendor bulletins list the exact appliance versions and recommended upgrades. Upgrading the vendor‑packaged appliance is the supported remediation path for many enterprise customers.
- Environments that are multi‑tenant or that run untrusted workloads on the same nodes as Ceph daemons (for example, certain hosted or self‑managed cloud storage configurations) are particularly exposed because the required local write access to the crash directory is easier to obtain in those settings.
Immediate mitigation and remediation guidance
If you manage Ceph clusters or appliances, follow a prioritized plan: inventory, mitigate, patch, and verify.
1. Inventory — find your exposure
- Identify all hosts that run ceph‑crash.service and list package versions for Ceph on those hosts. Check distro security trackers for your release to find the fixed package versions.
- Search for the existence and permissions of the crash directory (typically /var/lib/ceph/crash) and confirm who can write to it. If unprivileged accounts or containers can write there, treat the host as high‑risk.
2. Short‑term mitigations (stop the bleeding)
- If you cannot immediately apply a vendor or upstream patch, temporarily disable the ceph‑crash.service to remove the attack surface. For most Linux systems:
- sudo systemctl stop ceph-crash.service
- sudo systemctl disable ceph-crash.service
- Note: disabling crash collection reduces diagnostic telemetry; document the change and re‑enable after patching. Several advisories list disabling the service as a pragmatic stopgap.
3. Patch — apply the vendor/distribution update
- Consult your vendor or distribution advisory and identify the fixed package(s) for your Ceph version. Distributions have marked fixed versions; for example Ubuntu references fixes in Ceph 17.2.6 and backports for older series.
- Schedule a maintenance window for installing the updated Ceph package or vendor firmware. For vendor appliances, follow the vendor’s published upgrade procedure.
- After applying the fix, re‑enable ceph‑crash.service and verify crash posting behavior in a controlled manner.
4. Post‑patch verification
- Inspect systemd journal for any unusual ceph‑crash invocations and validate that the upgrade removed the insecure behavior by attempting benign, controlled crash posts in a test environment. Check that ownership and permission semantics for the crash directory are now strict and that the service no longer acts on attacker‑controlled paths.
Detection and forensics recommendations
If you suspect exploitation, act quickly and follow an evidence‑preserving process:
- Audit the list of files under the crash directory for unexpected timestamps, owner changes, or unusual filenames that correspond to the time an account may have been active.
- Check systemd journal and Ceph logs for
ceph crash post invocations or sudden spikes of crash posting activity.
- Search for newly created setuid binaries, modified system configuration, or unexpected root shells — all classical post‑EoP indicators.
- Preserve logs and disk images; because the attacker requires local write access, there may be traces in user home directories, container overlayfs layers, or CI/CD job workspaces.
- If your organization uses centralized logging or endpoint detection, create or refine rules to alert on suspicious writes into /var/lib/ceph/crash and on invocations of ceph‑crash outside of scheduled maintenance windows.
Hardening and long‑term risk reduction
The vulnerability highlights wider operational hardening steps operators should adopt to reduce the blast radius of similar design mistakes:
- Enforce least privilege: run diagnostic collectors (like ceph‑crash) with the minimum privileges necessary. If possible, run them under an unprivileged account and use privilege‑separation wrappers or helper processes rather than running everything as root.
- Harden filesystem permissions for diagnostic directories: ensure only explicit, trusted processes can write to crash directories. Where possible, use separate, per‑daemon directories with restrictive permissions.
- Use mandatory access controls (AppArmor, SELinux) to constrain what privileged processes can access even if they run as root. Confinement policies make exploitation harder or impossible in many real‑world attack chains.
- Inventory and isolate workloads: avoid running tenant or untrusted workloads on the same nodes as critical storage daemons. If isolation isn’t feasible, ensure container runtimes and orchestrators enforce strong filesystem and capability sandboxing.
- Keep an up‑to‑date mapping of which product images and appliance builds contain which Ceph versions; vendor appliances may require an appliance‑level upgrade rather than a package upgrade.
Critical appraisal — strengths and residual risks
What vendors did right
- Upstream remediation and backports: Ceph upstream identified the root cause, implemented targeted fixes, and prepared backports for maintained branches so operators could update without unnecessary major‑version jumps.
- Distribution and vendor advisories: Multiple distributions and vendors pushed coordinated security notices and patches, and provided fixed package versions and recommended remediation paths. That coordination helped reduce the window of exposure for many users.
Remaining concerns and operational risk
- Time to patch in appliance ecosystems: Vendor‑packaged appliances and managed storage offerings may lag behind upstream/distro patches; customers running those appliances must rely on vendor timelines. IBM and other vendors published separate advisories reflecting this complexity.
- Local‑access requirement can be deceptive: Although the vulnerability is local (not remotely exploitable by default), many cloud and multi‑tenant architectures make "local" accessible in ways operators underestimate. Containers, shared runners, or compromised low‑privilege users can bridge that gap. This makes the vulnerability more dangerous in modern deployment models.
- Operational tradeoffs for temporary mitigations: Disabling crash collection is an effective stopgap but degrades post‑incident diagnostics. Some operators may delay disabling the service to retain observability; that decision should be weighed against the real threat model for each environment.
Practical checklist for Ceph administrators (urgent actions)
- Inventory all Ceph hosts and record Ceph package versions and whether ceph‑crash.service is active.
- If you are unable to patch immediately, stop and disable ceph‑crash.service, document the change, and plan a rapid re‑enable after patching.
- Apply the distro or vendor security update that contains the fixed Ceph packages (for many distributions the fix is upstreamed into Ceph 17.2.6 or backported packages).
- Verify directory permissions and ownership for the crash directory; restrict writes to trusted processes only.
- Review host isolation and container bind mounts to ensure untrusted workloads cannot write into Ceph data or diagnostic locations.
Conclusion
CVE‑2022‑3650 is a textbook example of how diagnostic or helper services, when run with elevated privileges and coupled with writable data paths, can create outsized security risks in distributed systems. Patching is the correct long‑term fix and vendors and distributions have published updates and backports; however, the operational realities of multi‑tenant environments and vendor‑bundled appliances mean many organizations should treat this as an urgent remediation item.
Do the basics first: inventory, mitigate by disabling the service if necessary, patch to the fixed Ceph or vendor release, and harden the crash‑collection path with strict permissions and system confinement controls. Those steps will close the immediate attack vector and reduce the chance that a local compromise becomes a full cluster compromise.
For operators: do not assume "local only" equals "low risk" — modern architectures make local privilege paths meaningful. Prioritize this patch in any environment where Ceph runs on multi‑tenant hosts or where untrusted code might have write access to system diagnostic directories.
Source: MSRC
Security Update Guide - Microsoft Security Response Center