CVE-2024-0450: Patch Stops Quoted Overlap Zip Bombs in Python ZipFile

  • Thread Author
The discovery and coordinated patching of CVE-2024-0450 closes a subtle but consequential gap in CPython’s zipfile module: quoted‑overlap zip‑bombs that can weaponize compliant ZIP metadata to force excessive, asymmetric resource consumption during extraction. The Python Security Team, upstream maintainers, and multiple Linux distributors have confirmed the issue and delivered fixes that cause zipfile to reject archives with overlapping entries; still, the vulnerability’s characteristics—local attack vector, high availability impact, and prevalence of Python in analysis pipelines—mean the risk map extends beyond desktop Python interpreters to servers, CI pipelines, and appliances that embed CPython. (github.com)

Illustration of patching a Python BadZipFile vulnerability with patch notes and a technician.Background / Overview​

Zip‑bombs are an old, well‑understood class of denial‑of‑service tool: small archives that expand into enormous amounts of data when decompressed, consuming disk, memory, or CPU and crippling the host that attempts to process them. Historically, defenders focused on naive compression ratio checks and detection of trivially overlapping entries, but attackers have iterated on archiveass those heuristics. The CVE‑2024‑0450 finding surfaced a particular technique called quoted‑overlap (sometimes shown in research as a more sophisticated overlap technique) that evades some of the zipfile module’s prior defenses. (github.com)
In simple terms, a quoted‑overlap zip‑bomb leverages the ZIP file metadata—central directory records, local file headers, offsets and lengths—to create entries whose described payloads overlap the same compressed bytes in the archive. If a decompressor blindly trusts the central directory or local headers and writes out multiple large logical file sizes that reference the same compressed data or overlapping streams, the result can be enormous extracted output from a small on‑disk artifact. CPython’s zipfile prior to the patch could accept such carefully crafted archives and proceed with extraction, enabling sustained or persistent availability loss on the host that processed them. (github.com)

What precisely is CVE‑2024‑0450?​

  • Vulnerability name: Quoted zip‑bomb protection for zipfile (CVE‑2024‑0450).
  • Affected component: CPython standard library — the zipfile module.
  • Root cause: zipfile accepted and attempted to extract archives containing overlapping entries (quoted‑overlap), which could be constructed to massively inflate extracted size and exhaust system resources.
  • Impact: Asymmetric resource consumption (CWE‑405) with high availability impact (extraction can cause service or host denial of service). Confidentiality and integrity are not impacted by this flaw.
The NVD and distribution advisories consistently score the issue as Medium under CVSS v3 (commonly 6.2) with the attack vector classified as local (AV:L) and availability impact rated High. In plain language: an application or user that opens or extracts a malicious ZIP file using an unpatched zipfile implementation can be made to run out of resources; however, the exploit requires the vulnerable code to process the crafted archive, so remote, unauthenticated attacks are only possible if an attacker can get the victim to process the archive (e.g., via user upload, email attachment, or an automated ingestion pipeline).

Timeline and fixes​

The issue was raised upstream through a GitHub issue (GH‑109858) and addressed by a focused pull request (GH‑110016) that was cherry‑picked into the supported CPython patch releases. The fix implements explicit checks such that when an entry in the archive would read or overlap compressed bytes belonging to another entry (or the central directory), zipfile raises BadZipFile rather than proceeding with extraction. The change was applied across supported maintenance branches and packaged by Linux distributors and vendors. (github.com)
Fixed CPython patch releases reported by vendors and PSF announcements include (examples; confirm for your platform before upgrading):
  • 3.12.x: 3.12.2 (and subsequent micro releases that include the patch)
  • 3.11.x: 3.11.8 and newer
  • 3.10.x: 3.10.14 and newer
  • 3.9.x: 3.9.19 and newer
  • 3.8.x: 3.8.19 and newer (where applicable)
Distributors (Ubuntu, Amazon Linux / ALAS, SUSE, Debian, etc.) issued their own advisories stating the same root cause and listing the patched package versions that incorporate the upstream CPython fix. Because packaging timelines differ, always confirm the exact patched package name and version for the platform you manage.

Technical deep dive: how the attack works and why zipfile accepted it​

ZIP internals refresher​

A ZIP file stores per‑entry metadata twice: once as a local file header immediately before the compressed data, and again in the central directory at the end of the file. Each record contains offsets, compressed sizes, and uncompressed sizes. Correct implementations reconcile both records and validate that offsets and sizes are sane before extracting the data. Attackers can abuse inconsistencies and edge cases in that metadata to craft archives whose logical contents are much larger than their on‑disk size. (github.com)

The quoted‑overlap trick​

  • The attacker creates two (or more) ZIP entries that claim file sizes much larger than the compressed data in the archive but point, in their headers, to overlapping or identical ranges of compressed bytes.
  • When a naive extractor reads the local header or central directory and trusts the file_size/compressed_size and offset values without cross‑validation, it will decompress the same compressed data region multiple times or interpret overlapping reads as separate inflated outputs.
  • The result: small archive → repeated or multiplied decompressed output → exhaustion of disk or memory when multiple entries are extracted. This mechanism can produce arbitrarily large effective output ratios (small input, huge unpacked payload). (github.com)

Why Python’s zipfile was vulnerable​

Zipfile previously had checks that caught some overlap patterns (e.g., simple "full overlap" cases). The quoted‑overlap variant deliberately crafts offsets and lengths that live within subtle interpretation gaps zipfile had when reconciling central directory vs. local header information and when computing absolute offsets. The upstream fix hardens the validation logic: when zipfile attempts to read an entry it now verifies whether the entry’s compressed region overlaps with other entries or with the central directory, and raises BadZipFile on any overlap attempt. The change also adds tests to ensure regressions cannot reintroduce the behavior. (github.com)

Real‑world impact scenarios: where this matters​

  • Automated ingestion pipelines: Email gateways, content scanners, malware labs, or CI systems that automatically open or scan ZIP artifacts are at high risk. An untrusted or attacker‑controlled archive processed automatically can starve resources and take the service offline.
  • User‑facing applications: Desktop archive utilities, web apps that extract user uploads server‑side, and file viewers that use CPython to inspect archives are vulnerable if they call zipfile.extract or similar APIs on untrusted data.
  • Embedded systems and appliances: Vendor appliances that embed CPython as part of management or telemetry stacks may be susceptible if they extract user‑controlled ZIPs (for example, firmware update processors or log ingestion agents).
  • Third‑party products: Any product bundling CPython (or linking to system libpython) without the patch could carry the vulnerability into downstream ecosystems. Vendors like storage appliances that use upstream distributions should verify their binary images contain the fixed python packages.
The availability consequences range from trivial (a single extraction attempts to allocate extra disk and fails gracefully) to severe (a background service repeatedly processes attacker feeds and remains persistently out of service until operator intervention). That matches the CVE narrative about sustained or persistent loss of availability.

Who needs to care — affected components and packaging nuance​

  • Direct CPython users: If you run CPython from an official or distro package and use standard library zipfile to extract data, confirm your interpreter includes the fix.
  • Python‑embedded apps: Applications that bundle a specific CPython minor release (for example, proprietary apps shipping Python 3.9.x) must update their embedded interpreter or apply vendor patches.
  • Distributors and appliance vendors: Because CPython is widely redistributed, vendors and distros backported the fix into their package trees; check your platform security advisories for the exact package update. Many vendors published advisories and CVE entries in conjunction with the PSF announcement.
Note: CVE metadata and fixed version numbers can differ slightly in wording across advisories (some list the last vulnerable micro‑version; others highlight the first fixed micro‑release). Treat distributor advisories as authoritative for those packaged binaries.

Mitigations and recommended actions​

If you maintain systems that process ZIP files with Python, apply the following checklist immediately:
  • Inventory
  • Identify systems and services that use CPython’s standard library zipfile (including embedded interpreters). Look for applications that call zipfile.ZipFile, zipfile.extractall(), or related APIs on untrusted input.
  • Patch
  • Upgrade to the patched CPython maintenance release for your branch, or apply your distribution’s security update packages. Confirm the applied package includes the upstream GH‑110016 fix. (github.com)
  • Hardening (short term)
  • Avoid auto‑extracting untrusted ZIPs. Where possible, open archives in a sandboxed process with strict resource limits (disk quotas, ulimit for CPU and memory).
  • If you must process untrusted archives, run extraction in a container or temporary filesystem sized to the expected maximum, and enforce timeouts and file‑count limits.
  • Input validation
  • Prefer API calls that read entries without auto‑extracting them, inspect reported compressed/uncompressed sizes, and refuse archives with suspiciously high compression ratios or inconsistent headers.
  • For scanning pipelines, decompress to a sparse or pre‑allocated space that prevents uncontrolled growth.
  • Detection
  • Add logging and monitoring around archive‑processing subsystems: track sudden spikes in disk I/O, unexpected file counts, rapid growth of extracted data, and repeated exceptions like BadZipFile.
  • Vendor coordination
  • If you use third‑party products that bundle Python (appliances, SDKs), confirm with vendors that their images have been updated. If a vendor has not provided a patch, apply compensating controls (sandboxing, ingestion quarantine).
These steps reduce the attack surface dramatically and protect systems while you coordinate patching across supply chains.

Detection, forensics, and incident response​

  • Indicators of exploitation are primarily operational: excessive disk usage post‑extraction, numerous large files appearing after a small archive is processed, or processes consuming large memory and hitting resource limits soon after handling a ZIP.
  • Forensic traces:
  • Application logs recording zipfile.extract or read operations along with timestamps.
  • Exceptions: in patched Python, attempts to process overlapping entries raise BadZipFile; unhandled exceptions or crashes may be visible in systemd logs, container logs, or process crash dumps.
  • File system traces showing many files created rapidly in the same dire for false negatives where extraction succeeded but produced excessive output.
  • Triage steps:
  • Isolate the host or process to prevent further resource consumption.
  • Capture the offending archive (do not open it further on analysis systems without protections).
  • Gather logs showing sequence of extraction calls and resource usage.
  • Patch the environment and re‑process the archive in a controlled sandbox for evidence collection. (github.com)

Why this matters beyond Python — supply chain and ecosystem risk​

Zip libraries and archive handling code are core plumbing in many ecosystems: security scanners, email gateways, CI systems, malware analysis sandboxes, backup systems, and even firmware update mechanisms. A vulnerability in a widely redeployed library like CPython’s zipfile module can propagate into a heterogeneous set of products that trust the upstream implementation.
Historical context shows this pattern: similar ZIP‑format quirks and gzip/zip library bugs in other runtimes (Go’s archive/zip issues and gzip recursion bugs, for example) have repeatedly resulted in DoS vectors until patched. CVE‑2024‑0450 is a reminder that format parsing is often where attackers can force asymmetrical costs onto defenders.
Key supply‑chain lessons:
  • Patch upstream libraries quickly and verify vendor snapshots embed the fixes.
  • Use defense‑in‑depth: sandboxing, quotas, and robust validation should be defaults when processing any untrusted binary formats.
  • Communicate with vendors: many appliance vendors depend on distro packages and will issue their own advisories; track those advisories and apply vendor instructions promptly.

Strengths of the fix — what upstream did right​

  • The upstream patch is narrowly focused and defensive: rather than trying to predict every malicious compression ratio, it hardens zipfile to validate entry boundaries and fail fast when overlap is detected.
  • Tests were added to the standard library test suite to prevent regressions and to ensure future changes don’t reintroduce the flaw.
  • The fix was cherry‑picked into multiple maintenance branches and rolled into vendor packages, showing good PSF→distro coordination. (github.com)

Residual risks and caveats​

  • Not all vendors and appliances patch at the same cadence. Systems that bundle an older CPython release or a frozen interpreter may remain vulnerable until vendor updates are applied.
  • The fix addresses overlapping regions in zip metadata; it does not substitute for broader protections against zip‑bombs that rely on extremely high compression ratios without metadata overlap. Attackers may adapt; defenders should keep layered protections in place.
  • Detection is imperfect: some legitimate archives can legitimately have large compression ratios or unusual metadata; naive blocking could break benign workflows. Implement safe‑fail behaviors (fail extraction and log) rather than silent data loss.

Practical checklist for administrators and developers​

  • For administrators:
  • Immediately inventory endpoints and servers that run Python or ship appliances with embedded Python interpreters.
  • Apply vendor patches or upgrade the system Python to the fixed maintenance version for your branch. Confirm via package manager that the package contains the GH‑110016 commit or corresponding changelog entry. (github.com)
  • If you cannot patch immediately, add process or container resource limits and quarantine incoming ZIPs pending analysis.
  • For developers:
  • Avoid calling extractall() on untrusted archives. Instead, inspect the namelist and entry headers, validate sizes, and extract to a controlled temporary directory under quota.
  • Add robust exception handling for BadZipFile and instrument logging to capture the archive path and metadata.
  • Write tests that exercise edge cases (overlap, extreme sizes) and run them in CI to prevent regressions. (github.com)

Closing analysis and outlook​

CVE‑2024‑0450 is not a headline‑style remote code execution issue, but it is a textbook example of how format parsing mistakes create operational and availability risk. The vulnerability underscores three durable truths for defenders:
  • Format parsers must be defensive by default. Always validate offsets, lengths, and ranges.
  • Patching upstream libraries is necessary but not sufficient; runtime controls (sandboxing, quotas) are essential mitigation layers.
  • Supply chain coordination matters: an upstream fix only becomes protective when it’s embedded and deployed across vendors and distributions.
The CPython community responded responsibly: a tight, well‑tested fix, coordinated releases, and vendor packaging. That mitigates the immediate threat. However, because zip handling is ubiquitous and attackers constantly iterate on compression and metadata tricks, organizations that process untrusted archives must adopt the layered protections described above and treat archive parsing as a high‑risk operation that deserves the same hardened mindset we apply to other binary format parsers.
If you manage systems that accept ZIP files from untrusted sources, treat this as actionable: verify your Python packages, confirm vendor updates for appliances, and implement sandboxing and quotas for archive processing pipelines today. The fix exists; the remaining job is operational: find where untrusted ZIPs are processed and make sure they're handled safely.

Source: MSRC Security Update Guide - Microsoft Security Response Center
 

Back
Top