Keras Tar Extraction CVE-2025-12638: Patch in 3.12.0

  • Thread Author
Keras’s popular helper function for downloading and unpacking model assets, keras.utils.get_file, contains a dangerous extraction shortcut: when asked to extract tar archives it relied on Python’s tarfile.extractall without the stronger filters introduced in recent Python releases. That omission — coupled with a separate bug in Python’s tarfile symlink/path resolution — allowed crafted tarballs to place files outside the intended cache directory (a classic path traversal / ZipSlip outcome). The problem has been assigned CVE‑2025‑12638 (Keras 3.11.3 affected) and has an official code-level fix in Keras 3.12.0; defenders should treat this as a high‑severity supply‑chain risk for any environment that programmatically downloads and extracts remote tar archives.

Illustration of a Python tarfile extraction vulnerability (CVE-2025-12638) with Safe Extract shield.Background / overview​

Keras provides convenience helpers to fetch remote files and extract them to a local cache. That convenience is invaluable for reproducible model downloads and for machine‑learning workflows where datasets or pretrained weights are fetched on first use. The same convenience, however, becomes an attack surface when code unpacks untrusted tar archives without robust path and link checks.
  • The Keras function at issue is keras.utils.get_file(..., extract=True) when extracting tar archives. The function historically invoked tarfile.extractall in a way that did not opt into Python’s defensive extraction filters.
  • The underlying Python tarfile implementation introduced a filter parameter in Python 3.12 and later; using filter="data" performs additional checks on member names and symlinks during extraction. Keras’s extraction path did not consistently leverage the safer extraction pathway in older releases, enabling exploitation in specific conditions.
  • Separately, an upstream Python issue (CVE‑2025‑4517) showed that the extraction filters themselves could be bypassed by specially crafted symlink / pathname sequences if the tarfile implementation’s realpath/resolve logic hit path‑length and symlink resolution corner cases. The interplay between Keras’s pre‑extraction filtering and Python’s path resolution is the core reason a crafted archive could bypass Keras’s initial checks and escape the cache directory.
Multiple public trackers list the Keras CVE and describe the same high‑level mechanics: path traversal via tar extraction, arbitrary write outside the cache directory, and potential for code execution or persistent compromise if an attacker can control archive contents. These trackers assign high severity to the issue and map fixes to Keras 3.12.0.

How the vulnerability works — technical anatomy​

The extraction chain and where checks fail​

  • The application (or a library) calls keras.utils.get_file(url, extract=True). Keras downloads the archive and prepares to extract it into a local cache path.
  • Keras runs a filtering routine (filter_safe_paths or a similar pre‑extraction check) to drop or warn on filenames that look like "../" traversal or otherwise suspicious entries.
  • The archive is then passed to Python’s tarfile module for the actual extraction step. Historically, Keras invoked the tarfile extraction methods in a way that did not set the safe extraction filter argument (or relied on pre‑filtering only). That left the final extraction dependent on the tarfile module’s resolution of symlinks and long pathnames.
  • Due to a PATH_MAX / symlink resolution edge case in Python’s tarfile implementation (documented and tracked as CVE‑2025‑4517), symbolic links or long path constructs inside a crafted tarball could cause realpath resolution to stop or behave unexpectedly. The result: a path that the pre‑extraction filter thought was safe could be resolved during extraction to a location outside the intended root — letting files be written where they shouldn’t.

Why pre‑filtering alone is fragile​

Pre‑extraction checks on names (rejecting entries containing "../") are necessary but not sufficient. Archives can include symlinks and hard links whose resolution combines with path concatenation and operating‑system‑level path limits to produce different final target paths. A robust extraction pattern enforces checks at extraction time — ideally inside the tarfile extraction routine itself — and treats symlinks and link targets specially rather than trusting pre‑filters alone. This is precisely the gap the Keras code had before the patch.

What was changed (the fix) and who fixed it​

  • Keras maintainers added a shared extraction routine that uses the tarfile filter option (filter="data" when available), centralizes extraction logic across zip/tar code paths, and introduces additional runtime filtering to skip unsafe names and symlinks. The commit that implements this change is 47fcb397... and is part of the work that landed in Keras 3.12.0. The change explicitly replaces direct calls to archive.extractall(... with a guarded extraction helper.
  • The GitHub Advisory and Keras security advisory map the fix to patched versions: 3.12.0 and later; affected versions include 3.0.0 through 3.11.3 depending on packaging. Rely on the Keras release notes and your packaging channel (pip, distro packages) to confirm the installed version.
  • The underlying Python CVE (CVE‑2025‑4517) has its own vendor patches and distribution updates (for CPython and downstream packages). Fixing Keras alone is advisable, but when your environment uses Python versions that expose the tarfile filter behavior, you should also ensure your Python runtime receives vendor fixes. Distribution vendors (Ubuntu, SUSE, Oracle Linux, Debian, Red Hat) published patches and updated python3.* packages after the tarfile issue was disclosed.

Affected versions and exposure assessment​

  • Keras versions prior to 3.12.0 are potentially affected when using keras.utils.get_file(..., extract=True) on tar archives (including .tar.gz / .tgz) and when the archive contains crafted symlinks or path constructs. Public trackers list 3.11.3 specifically in the CVE record for one mapping.
  • The underlying Python tarfile vulnerability affects Python 3.12 and later (the extraction filter parameter didn’t exist earlier). Some distributions built backports/patches; vendor package versions vary — consult your distro advisory for the exact patched python package versions. Upgrading Python alone is not a full mitigation for Keras users; Keras itself must be upgraded where Keras calls extraction explicitly.
  • The exposure model depends on whether your code, CI runners, data pipelines, or notebooks automatically call get_file with extract=True on files whose origin you do not fully control. Automated systems fetching remote archives (for example, dataset fetching in training jobs or lightweight model downloads in container bootstrap scripts) are the highest‑value targets. Systems that only install vetted packages from trusted registries and do not auto‑extract untrusted archives are at lower risk.

Practical attack scenarios (what an attacker can do)​

  • Arbitrary file writes: Place a file into locations that are processed on boot, or into configuration folders belonging to services. This can cause persistence or service manipulation when the system later reads those files. The impact ranges from local compromise to supply‑chain insertion if build systems extract poisoned archives.
  • Remote code execution: If the target service or a follow‑on process automatically executes scripts from the cache or if the attacker writes executable content into a directory that gets executed (e.g., Startup, systemd unit directories on Linux, or scheduled tasks), arbitrary code execution is possible. Attackers can weaponize this to pivot or to implant a backdoor.
  • Denial of service / data destruction: Overwriting critical files, or filling disks with crafted payloads, can degrade or deny service. The original CVE mapping includes high confidentiality/integrity/availability risk scoring in some trackers because successful arbitrary writes can be destructive.

Detection, indicators and hunting tips​

Detecting exploitation requires a combination of host and pipeline telemetry because the act of extraction is benign in normal workflows.
  • Hunt for unexpected writes outside the Keras cache directory (by default ~/.keras or the path returned by get_file. Look for sudden new files in system directories (e.g., /etc, /usr/local/bin, /var/tmp) created by the user or process that ran your Python process.
  • Audit command histories and process creation logs on build/CI runners and developer workstations for python invocations that include get_file or dataset download scripts. Look for network downloads followed shortly by unexpected file modifications.
  • Log and alert on tar extraction failures or warnings that mention skipped invalid paths or symlink warnings — recent Keras patches emit warnings when they skip paths; pre‑patch systems won’t. Anomalous warnings about archive members should be treated as high priority.
  • Network indicators: If artifacts are fetched from attacker‑controlled hosts, correlate downloads to suspicious domains or IPs. While many downloads come from trusted registries, CI logs may show external URLs for training datasets or model weights.
Practical SIEM/search strings (examples)
  • File creation outside expected cache paths: FileWrite where process.name == "python" AND not path.startswith("~/.keras")
  • Process chain: ProcessCreate where parent.name in ("python", "jupyter") AND child writes to /etc or /usr/local
  • Archive warnings: ApplicationLog contains "Skipping invalid path during archive extraction" (patch‑era Keras emits such warnings) — these logs can help identify attempted exploits.

Immediate mitigation and remediation (operational checklist)​

  • Upgrade Keras
  • Upgrade to Keras 3.12.0 or later in all environments where Keras is installed and where get_file(..., extract=True) may be used. The Keras commit and GitHub advisory map this fix directly to v3.12.0.
  • Patch Python (where applicable)
  • Ensure your Python runtime packages are updated to vendor‑patched releases that address CVE‑2025‑4517. Many distributions published patched python3.12 / python3.13 packages; consult your distro advisory and apply the appropriate package updates. Upgrading Python alone does not remove the need to upgrade Keras.
  • Replace auto‑extract patterns
  • Audit code and CI pipelines for calls to get_file(..., extract=True) and replace them with an explicit, reviewable sequence:
  • download the archive,
  • scan it with malware/AV,
  • run a safe extraction routine that enforces absolute‑path checks and forbids symlinks, or extract in an isolated container with minimal privileges, and
  • perform integrity checks on unpacked files before moving them to sensitive paths.
  • Example safe_extract pattern (illustrative):
    Code:
    import os, tarfile
    def safe_extract(tar, target_dir): root = os.path.abspath(target_dir) for member in tar.getmembers: member_path = os.path.abspath(os.path.join(root, member.name) if not member_path.startswith(root + os.sep): raise RuntimeError("unsafe path in tarfile: " + member.name) if member.issym or member.islnk: raise RuntimeError("links not allowed in archive: " + member.name) tar.extractall(path=root, members=[m for m in tar.getmembers])
  • Use vetted extraction helpers from widely maintained libs or the patched Keras code path rather than rolling your own ad hoc logic.
  • Harden permissions and runtime context
  • Ensure processes that perform extraction run under accounts with the least privilege required and that the cache directory has restrictive ACLs.
  • Use containerized extraction in CI or ephemeral sandbox VMs with no access to host configuration directories.
  • Quarantine and vet incoming archives
  • Treat archives from third parties, shared datasets, or unvetted URLs as untrusted input. Run static checks, unzip/untar scanners, and policy enforcement before making their contents available to production workflows.

Longer‑term risk controls and remediation strategy​

  • Inventory and dependency hygiene: Maintain an inventory of Python packages and library versions across your fleet. Automated SBOMs and dependency scanning will flag Keras versions < 3.12.0 and help prioritize upgrades.
  • CI/CD gating and artifact provenance: Avoid dynamic downloads of model artifacts during image builds or runtime bootstrapping. Prefer pre‑fetched, signed artifacts stored in internal artifact registries with recorded provenance.
  • Threat modeling for ML workloads: Consider the unique supply‑chain risks of ML systems: models, datasets, and weights are all remote artifacts that may be supplied by untrusted or compromised channels. Hardening extraction and sandboxing should become part of any ML threat model.
  • Monitor third‑party advisories and CVE feeds: The interaction between library-level fixes (Python’s tarfile) and application/library code (Keras) is a recurring pattern; timely cross‑correlation of advisories is critical. The Keras fix relied both on a code change and on the adoption of enhanced Python APIs; ensure your patching program covers both dependencies and dependent packages.

Critical analysis — strengths of the response and remaining risks​

Notable strengths​

  • Vendor response and patching were quick and surgical: Keras maintainers implemented a targeted extraction helper and centralized extraction logic so both tar and zip extraction paths benefit from consistent checks. The fix is included in an explicit release (3.12.0), which simplifies remediation across environments.
  • The community mapped the root cause to a known Python tarfile weakness (CVE‑2025‑4517), which clarifies the cross‑project failure mode and enables coordinated vendor fixes and distribution patches. This transparency improves remediation confidence and reduces the risk of a partial fix that leaves the underlying problem intact.

Remaining risks and caveats​

  • Patch windows in production are often non‑trivial: many organizations run older images or pinned dependencies (including Keras embedded in vendor packages or OS distro packages). Those environments may not be upgraded quickly, leaving a prolonged exposure window. The Debian tracker and distribution packaging records show that some distro packaged Keras or Python versions lag behind upstream releases.
  • Upgrading Python without upgrading Keras (or vice versa) is insufficient: the vulnerability is a combination of library behavior and application usage patterns. Both the interpreter and the consumer library require attention. Security teams must coordinate package updates across layers.
  • Supply‑chain transitive exposures: downstream projects that vendor or freeze Keras (or copy its get_file helper) may continue to be vulnerable even after the upstream fix is released unless they integrate the patch. Security teams should scan for copy‑and‑paste usage of the vulnerable extraction patterns in internal repos and third‑party tools.
  • Exploitability in the wild: public PoCs for Keras‑specific exploitation were not immediately ubiquitous at time of disclosure, but the underlying pattern (ZipSlip / tar extraction traversal) is a well understood, easily weaponizable primitive. Assume active exploitation is possible where an attacker can supply archives to extraction workflows (build systems, CI, dataset ingestion, or developer machines). Treat the absence of PoCs as a reason for caution, not complacency.

Action plan — prioritized checklist for Windows‑centric teams​

  • Inventory: Locate all hosts and containers that run Python + Keras. Pay special attention to developer workstations, model‑training CI runners, and build images used by data teams.
  • Patch Keras: Upgrade to Keras 3.12.0+ via pip or your package management system and validate test runs. Confirm the installed version with pip show or your deployment manifests.
  • Patch Python: Apply vendor patches for CVE‑2025‑4517 where relevant (consult distro advisories). If you cannot patch immediately, quarantine extraction workflows behind additional controls.
  • Remove risky automation: For the short term, disable automatic extract=True invocations on production runners. Convert automated archive handling into a human‑review or CI‑gate process that validates artifacts first.
  • Harden extraction: Use the Keras patched extraction path or implement robust safe_extract code. Run extractions in sandboxes or ephemeral containers that cannot write to host config paths.
  • Monitor: Create SIEM alerts for unexpected file writes from Python processes and for extraction warning messages emitted by the patched Keras code.

Conclusion​

CVE‑2025‑12638 is a textbook example of how convenience APIs and a subtle interpreter‑level bug can combine into a high‑impact supply‑chain vulnerability. The Keras team issued a focused fix that centralizes and hardens archive extraction and the Python community corrected the tarfile filter behavior — but the practical remediation requires a coordinated, multi‑layer approach: upgrade Keras to 3.12.0+, ensure Python packages are vendor‑patched for CVE‑2025‑4517, and harden extraction workflows in CI, developer machines and production inference environments.
Treat all unvetted archives as hostile input. Where possible, avoid automatic extraction of remote artifacts, restrict extraction privileges, and adopt an explicit safe‑extract pattern or rely on the patched Keras helper. Inventory and patch quickly, but prioritize systems where archives are consumed automatically (CI runners, onboarding scripts, and automated training pipelines). The fix exists; the operational work is in verifying you run the fixed code paths and in closing the procedural gaps that allowed an archive to become an attack vector. (Security Update Guide - Microsoft Security Response Center
 

Back
Top