Ceph RGW DoS via Empty Copy Source in CopyObject (CVE-2024-47866)

  • Thread Author
A newly disclosed high-severity vulnerability in Ceph’s RADOS Gateway (RGW) lets an unauthenticated attacker crash the RGW daemon by issuing an S3 object-copy operation that includes an empty x-amz-copy-source value, producing a reliable denial‑of‑service (DoS) that can render S3-compatible object storage unavailable for affected releases.

Neon red illustration of Ceph RGW logo beside a server rack with an S3 bucket icon.Background​

Ceph’s RGW implements S3-compatible object operations for distributed object storage deployments used by cloud providers, enterprises, and on-prem clusters. On 11–12 November 2025 a Ceph security advisory and coordinating disclosures documented a parsing/validation defect: when a client issues a PUT (CopyObject) with the x-amz-copy-source argument present but set to an empty string, RGW’s copy-path parsing code fails to validate the empty value and the daemon crashes. The issue was publicly assigned CVE‑2024‑47866 and scored as High (CVSS v3.1 base score 7.5). This is a classic improper input validation vulnerability (CWE‑20) where malformed but syntactically plausible inputs bypass expected sanity checks and trigger a fatal condition in the service. The bug affects Ceph releases up to and including v19.2.3 according to the advisory; at initial publication no widely released product packages had a final, user-ready patch, although an upstream pull request to add the empty‑value check had been opened.

Why this matters: availability impact and blast radius​

  • Availability is the primary casualty. The vulnerability causes the RGW process to crash, which in many deployments means immediate loss of S3-compatible access to objects and buckets. Because RGW typically fronts multi-tenant object storage or tenant namespaces, a single exploited RGW node can affect many workloads.
  • No authentication required. The issue can be triggered by unauthenticated S3 sessions; credentials are only necessary to create the original source object in some repro scenarios, but the DoS can be induced without credentials in many configurations. That lowers the bar for attackers and increases the realistic exploitability.
  • Low complexity, high reliability. Constructing a CopyObject request where the copy-source header value is empty is straightforward. The advisory includes a small reproducer using common S3 SDKs (boto3) and an attached s3-request.zip demonstrating the issue. The simple exploit primitive and reliable crash behavior give this vulnerability a high practical severity.
  • Wider ecosystem risk. Because object storage is often integrated into cloud stacks, backup systems, CI/CD pipelines, and virtualization platforms, availability loss at the RGW layer may cascade into application outages, failed backups, and interrupted operations across many services. This amplifies the business impact beyond the single process crash.

Technical deep dive: what fails and why​

The trigger: CopyObject with an empty copy source​

S3-compatible object copy operations can be performed by specifying a copy source header (x-amz-copy-source) pointing at an existing object. In the vulnerable code path, RGW parses the copy source value to locate the bucket and object to copy. The upstream advisory explains that the parser did not explicitly reject or validate an empty copy source value, and that path led to an invalid internal state and ultimately a crash when downstream logic assumed a valid object identifier. The defective routine is identified as RGWCopyObj::parse_copy_location in the RGW codebase.

Root cause class: improper input validation / unguarded assumptions​

The core engineering error is a missing defensive check: code that consumes an externally supplied string must validate that the string is not empty, is correctly delimited, and contains the expected components (bucket and key). When such checks are absent, subsequent operations perform indexing, parsing, or lookups on invalid inputs and can dereference nulls, index past buffers, or otherwise reach an unrecoverable state. The Ceph advisory classifies the bug as CWE‑20 (Improper Input Validation).

Where the failure occurs in the stack​

The RGW copy flow involves HTTP header parsing, S3-API layer handling, and object metadata lookup. The vulnerable check sits early — in the copy-location parsing logic — so the malformed input hits internal state assumptions before later validation or sanitization layers can intercept it. Because the error manifests as a process-level crash, host-level process managers or orchestrators might restart the daemon, but repeated or parallel exploitation can keep the service impaired.

Evidence and corroboration​

Multiple independent trackers and security lists recorded the issue and the assigned CVE shortly after public disclosure. The authoritative upstream advisory appears in the Ceph GitHub security advisories (GHSA) with a short reproducer and the text quoted above; the upstream pull request (PR #65159) implements a defensive check to reject empty copy-source header values. The OSS security mailing list, Debian, NVD, Tenable, and other vulnerability feeds also recorded CVE‑2024‑47866 and reflected the same technical summary and scoring data. This cross-referencing confirms the core facts (impact, affected versions, CVSS score, and the proposed upstream fix).

Replication and proof-of-concept (what researchers published)​

The GitHub advisory includes a small repro script packaged as s3-request.zip and references a boto3 example showing how a CopyObject request with an empty x-amz-copy-source triggers the crash. The advisory also contains a sample of the HTTP headers observed during reproduction. Because the upstream PR contains the precise defensive check, the repro provides a deterministic way for operators to validate whether a specific RGW instance remains vulnerable in a test environment. Treat any public PoC code with caution: running it against production storage will intentionally cause outages.

Patch status and vendor responses​

  • Upstream Ceph: An upstream pull request (PR #65159) that adds an explicit check for empty HTTP_X_AMZ_COPY_SOURCE values was opened and coordinated with the advisory; the PR content and commit references are available in the Ceph repository. Operators are advised to watch official Ceph releases or backports for the finalized patch.
  • Distribution packages and downstream vendors: Several distributions (Debian, Red Hat derivatives) and vendor trackers have marked their Ceph packages as vulnerable and are tracking fixes or backports. Some downstream vendor advisories indicate backport plans or scheduled releases where the fix will be included. Operators should consult their vendor's security advisories for exact fixed package versions and CVE mappings.
  • NVD / public CVE listings: The NVD entry mirrors the advisory text and notes NVD enrichment is pending; NVD and other databases record the base facts and link back to the GHSA advisory.
Caution: patch availability can vary across vendors and package repositories. At the time of initial disclosure some mainstream packaged distributions had not yet shipped fixed packages. Before applying any upgrade, verify the exact package version and review vendor release notes; if a distribution claims to ship a backport, confirm the changelog entry or the exact commit hash it contains.

Immediate mitigations and compensating controls​

When a full upstream or vendor patch is not yet available, the following actions reduce the attack surface or lower exploitation likelihood. Implement these in order of practicality and impact.
  • Inventory and exposure assessment
  • Identify every host, container, or appliance running radosgw or ceph-radosgw packages.
  • Determine whether RGW endpoints are reachable from untrusted networks (internet-facing) and which front-line load balancers or proxies terminate client connections.
  • Note clusters that use multi-instance RGW frontends — outages may affect entire tenant pools.
  • Network-level protections and filtering
  • Block or restrict access to RGW endpoints from untrusted networks using firewall rules, security groups, and WAF policies.
  • If a proxy or load balancer sits in front of RGW, configure it to reject or sanitize requests where x-amz-copy-source is present but has an empty value. Many load balancers allow header-based rules to drop or return 4xx for malformed headers.
  • Deploy rate-limiting and connection limits to reduce the ability of a single remote actor to repeatedly crash or keep a node recovering.
  • Application-layer request validation
  • Where possible, enforce server-side checks in fronting proxies or API gateways to enforce that required headers like copy-source contain non-empty, valid S3-style paths before forwarding to backend RGW nodes.
  • Operational isolation and restart policies
  • Prefer orchestrators and process managers that isolate crashes: run RGW under a supervisor that restarts only after investigation and with back-off, so repeated crashes don't create noisy thrashing and mask root cause details.
  • Consider temporarily removing vulnerable RGW nodes from load-balancing pools until patched if user-facing availability is critical.
  • Monitoring and alerting
  • Add alerts for RGW process crashes, high restart rates, and sudden 5xx responses on object copy endpoints.
  • Inspect access logs for suspicious sequences of CopyObject requests, especially those with unexpected header values or empty copy-source fields.
These mitigations are practical stopgaps but do not replace code-level fixes. Network filtering that drops malformed requests is the most direct short-term defense for internet-exposed endpoints, while restricting RGW exposure remains the strongest operational control.

Detection: indicators to hunt for​

  • HTTP(s) logs or proxy logs showing PUT (CopyObject) requests with x-amz-copy-source headers that are empty or contain only delimiters/whitespace.
  • Sudden RGW process deaths, core dumps, or OOM events correlated with object-copy traffic.
  • Repeated restarts of radosgw workers or anomalous load spikes on nodes serving S3 traffic.
  • IDS/IPS signatures: create rules that flag CopyObject patterns with empty copy-source or other malformed header payloads consistent with the repro.
When investigating potential exploitation, preserve logs and core artifacts for forensic analysis. If evidence of exploitation exists, treat it as an availability incident and follow normal incident response playbooks for critical storage infrastructure.

How the fix looks (upstream PR summary)​

The upstream pull request adds an explicit check that the HTTP_X_AMZ_COPY_SOURCE (and equivalent) header values are not empty before passing them into the copy-location parsing routines. The change is minimal and defensive: reject or return an error when the header is present but lacks a valid bucket/object specification. That short, targeted fix prevents the parser from entering the invalid internal state that leads to the crash. Operators should validate vendor backports or release notes mention this exact check or reference the upstream commit used.

Broader context: why HTTP/object-parsing bugs remain dangerous​

This RGW issue is part of a broader pattern across ecosystems: subtle parser or protocol-handling omissions lead to high-impact availability failures. Recent examples in other widely used stacks show similar root causes (HTTP chunk parsing, redirect handling, header parsing) where differences in strictness or missing checks allowed resource exhaustion or credential leakage. Those incidents demonstrate that robust defensive coding — explicit input validation, size/format checks, and early rejection of malformed content — is essential for network-facing components.

Recommended remediation checklist (prioritized for administrators)​

  • Inventory
  • List all Ceph RGW instances and record versions (identify any <= v19.2.3). Note how RGW is exposed (public internet vs internal networks).
  • Patch planning
  • Monitor the Ceph project and vendor advisories for the official fixed releases that include the upstream commit referenced in PR #65159; plan testing and staged rollout once packages appear.
  • Apply compensations now
  • Implement proxy-level header validation (drop requests with empty copy-source).
  • Restrict RGW access to trusted networks; use jump hosts or API gateways where possible.
  • Test and validate
  • Reproduce the fix in a lab environment using the provided repro (do not run against production) and confirm patched packages return safe errors rather than crashing.
  • Hardening
  • Add rate-limiting and request-size-limiting rules on edge appliances.
  • Instrument RGW error and process metrics; ensure on-call is alerted for repeated restarts.
  • Post-patch actions
  • After applying vendor patches, rebuild and redeploy nodes, confirm the fix using the repro in a test cluster, and monitor for abnormal behavior for several days.
  • Incident readiness
  • Prepare runbooks for rapid RGW isolation, node replacement, and data-recovery operations to reduce mean time to remediate during any future DoS activity.

Risk assessment and operational guidance​

  • For internet-facing RGWs: treat this vulnerability as urgent-high priority. The exploitation complexity is low and the impact is direct availability loss. Network-level mitigation and immediate patching (when available) should be scheduled as top priority.
  • For internal-only RGWs behind strict access controls: prioritize patching according to change windows, but still apply compensating controls such as restricting access and enabling application-layer validation on internal proxies.
  • For managed Ceph services and vendor-managed appliances: contact the vendor for status on the patch and whether their distribution contains the PR backport; insist on timetable for fixes and request mitigation guidance if the vendor has not yet shipped a patch.

Caveats and unverifiable items​

  • Public exploit vs. observed exploitation: at disclosure time there is no confirmed public evidence of widespread exploitation in the wild, but the exploit primitive is trivial to implement and so the absence of reported attacks is not a guarantee of safety. Treat any production evidence of repeated malformed CopyObject attempts as potentially malicious and respond accordingly.
  • Patch timelines vary by vendor: while the upstream PR is available, the timeframe for backports into enterprise-grade packages (e.g., vendor appliances, distro packages) differs; always verify the exact package changelog and commit hash when a vendor claims a patch.

Final analysis — strengths, weaknesses, and practical risk​

  • Strengths of the disclosure:
  • Clear, narrow root cause: the advisory identifies the exact parser routine and supplies a reproducible test case and an upstream PR that directly addresses the missing check. That clarity shortens remediation cycles and enables reliable fixes.
  • Low attack complexity: the vulnerability is trivially exploitable by automated tooling, enabling defenders to test and confirm patch efficacy quickly.
  • Weaknesses and residual risks:
  • Patch distribution lag: many operators rely on packaged builds maintained by vendors or distribution maintainers; backports must be coordinated and tested, and delays can leave production clusters exposed.
  • Operational exposure: RGW commonly handles critical workloads; even short outages can have outsized operational or business consequences. Defensive measures that are safe to apply vary across environments and must be validated to prevent accidental service disruptions.
  • Practical risk rating for WindowsForum readers and administrators:
  • If running Ceph RGW instances that are reachable from untrusted networks, treat this as urgent. Apply network-layer blocking and proxy validation until patches are installed and validated.
  • For internal-only RGW use, schedule patching promptly and apply compensating controls where feasible.

Closing recommendations​

  • Immediately inventory RGW deployments and their exposure posture. Verify package versions and track vendor advisories for the fixed builds that incorporate the upstream PR commit.
  • Implement edge/proxy header validation rules to refuse CopyObject requests with empty x-amz-copy-source values; this is a low-impact, high-value mitigation until official patches arrive.
  • Monitor RGW process health closely and add alerts for crashes and high restart rates. Prepare an operational runbook for rapid RGW replacement/rollback if a production node becomes unstable due to suspected exploitation.
  • After patching, validate the fix in a staging environment using the supplied repro (do not run the repro in production) and confirm RGW no longer crashes on malformed copy-source inputs.
This vulnerability reinforces the simple but critical lesson: never trust input from the network. Network-facing storage infrastructure must perform explicit validation at API boundaries; when those checks are missing, the result is often immediate, high-impact outages. The Ceph community’s advisory and the upstream PR provide a clear remediation path — the operational challenge now is timely, carefully tested deployment of those fixes across the diverse, distributed systems that depend on RGW.
Source: MSRC Security Update Guide - Microsoft Security Response Center
 

Back
Top