A subtle mistake in how container runtimes set Linux process capabilities quietly opened a path to privilege escalation in early 2022: containers launched by some versions of Podman and Moby (the open-source project behind Docker Engine) were started with non-empty inheritable capabilities, allowing certain binaries inside those containers to gain extra privileges during execve(2). The bug is tracked as CVE‑2022‑27649 for Podman (and closely related to CVE‑2022‑24769 for Moby/Docker), and it was fixed by runtime vendors — but its technical shape, the required threat model, and the correct operational responses deserve more than a headline. In this feature I explain what went wrong, who is affected, how operators can detect and remediate risk, and the longer-term lessons for container runtime security and capability hygiene.
Containers rely on the Linux kernel’s capability model to split the all‑powerful root privilege into finer-grained rights (capabilities) such as CAP_NET_BIND_SERVICE, CAP_SETUID, or CAP_SYS_ADMIN. Capabilities exist in several named sets — permitted, effective, inheritable, bounding and ambient — and the kernel applies clear rules during execve(2) to compute a new process’s capability sets. When the interaction between file capabilities (capabilities attached to an executable) and process capability sets is mishandled, an atypical environment is created where otherwise unprivileged processes can unexpectedly surface elevated rights.
In March–April 2022 maintainers announced that both Podman and Moby had started containers with non‑empty inheritable capabilities by default, contrary to typical expectations. That mistake permitted a scenario where executable files with inheritable file capabilities could, on exec, cause their inheritable bits to be added into the process’s permitted set — effectively elevating privileges inside the container up to the bounding set. Podman’s problem is tracked as CVE‑2022‑27649 and was patched in Podman v4.0.3; Moby’s closely related issue is tracked as CVE‑2022‑24769 and fixed in Docker Engine / Moby 20.10.14.
Why this mattered in practice: many container images assume a standard Linux capability environment where the inheritable set is empty unless someone explicitly configures it. When runtimes shipped containers with a non‑empty inheritable set, the containment assumptions of images and applications were violated, giving a path for local privilege escalation inside affected containers if certain files inside the image had file capabilities set.
If you run containers in production, treat this as a hygiene and process story as much as a single CVE: enforce capability minimization, scan images for file capabilities during CI, and ensure runtime upgrades are part of your vulnerability lifecycle. Do the upgrade, recreate containers, and add capability checks to your build pipeline — those steps remove this risk and harden your deployment against similar surprises in the future.
Source: MSRC Security Update Guide - Microsoft Security Response Center
Background / Overview
Containers rely on the Linux kernel’s capability model to split the all‑powerful root privilege into finer-grained rights (capabilities) such as CAP_NET_BIND_SERVICE, CAP_SETUID, or CAP_SYS_ADMIN. Capabilities exist in several named sets — permitted, effective, inheritable, bounding and ambient — and the kernel applies clear rules during execve(2) to compute a new process’s capability sets. When the interaction between file capabilities (capabilities attached to an executable) and process capability sets is mishandled, an atypical environment is created where otherwise unprivileged processes can unexpectedly surface elevated rights.In March–April 2022 maintainers announced that both Podman and Moby had started containers with non‑empty inheritable capabilities by default, contrary to typical expectations. That mistake permitted a scenario where executable files with inheritable file capabilities could, on exec, cause their inheritable bits to be added into the process’s permitted set — effectively elevating privileges inside the container up to the bounding set. Podman’s problem is tracked as CVE‑2022‑27649 and was patched in Podman v4.0.3; Moby’s closely related issue is tracked as CVE‑2022‑24769 and fixed in Docker Engine / Moby 20.10.14.
Why this mattered in practice: many container images assume a standard Linux capability environment where the inheritable set is empty unless someone explicitly configures it. When runtimes shipped containers with a non‑empty inheritable set, the containment assumptions of images and applications were violated, giving a path for local privilege escalation inside affected containers if certain files inside the image had file capabilities set.
The technical failure in plain terms
How capabilities normally transform on execve(2)
During execve(2) the kernel calculates new capability sets from the process’s existing capability sets and the file’s capability metadata. The kernel’s algorithm combines the process inheritable set with the file inheritable set and ORs permitted bits (subject to the bounding set) to produce the new permitted and effective sets. Critically, if a process already holds capabilities in its inheritable set, those capabilities can be added to permitted on exec when the file also declares inheritable bits. The canonical behaviour and formulas are documented in the Linux capabilities manual.What the runtimes did wrong
Both Podman (CVE‑2022‑27649) and Moby/Docker (CVE‑2022‑24769) were observed to start containers with a non‑empty inheritable set by default, rather than the empty inheritable set operators expect. That meant a process inside the container could already possess capabilities in its inheritable set at process creation time. If such a process later execve(2)s a file that also has file-level inheritable capabilities, the kernel’s exec logic could promote those inheritable bits into the process’s permitted set — enabling capability elevation that the container image author did not intend. Put another way: the runtime accidentally prepared a scenario where a benign exec sequence could yield unexpectedly privileged processes.The concrete exploit preconditions
This is not a remote, unauthenticated exploit. The attacker needs local execution context inside a container (or the ability to place or modify certain files in an image) and the presence of an executable with inheritable file capabilities inside that container. Typical exploit prerequisites are:- The host runs an affected runtime version (Podman < 4.0.3 or Moby/Docker < 20.10.14 prior to the fixes).
- The attacker has the ability to execute processes inside a container (e.g., via a shell in a running container or an exposed service that can execute helper binaries).
- The container image contains one or more binaries with file capabilities in their file xattr (setcap-defined capabilities).
- The runtime started that container with a non‑empty inheritable set (the bug), enabling the transfer from inheritable to permitted during exec.
Who should care and which deployments are at risk
- Operators using Podman versions earlier than v4.0.3 should treat their installations as potentially affected and prioritize upgrades.
- Docker Engine / Moby users running Docker Engine versions prior to 20.10.14 need to update: Docker’s 20.10.14 release explicitly updated default inheritable capabilities to address CVE‑2022‑24769. Running containers must be recreated for capability state to reset.
- Images that include setcap’ed binaries or that expect to rely on file capabilities (network utilities, helper binaries compiled/setcap by packaging scripts, or legacy admin tools) are the most directly at risk. Containers that use Linux users and groups inside the container for privilege separation are particularly relevant targets.
- Multi‑tenant clouds, CI runners, build farms, and developer workstations where untrusted users can start containers or supply images deserve priority — local CI jobs or tenant workloads could exploit this bug as part of an escalation chain.
How to verify exposure (detection & triage)
Start with inventory: what runtimes and images are running, and are any running with affected versions?- Runtime checks (quick):
- Podman: podman --version or check package metadata; ensure version >= 4.0.3.
- Docker / Moby: docker --version or consult the Engine release notes; ensure Docker Engine is 20.10.14 or later.
- Image / file capability checks (identify risky images):
- Inspect images and layers for files with file capabilities (the extended attribute security.capability). On a host, you can use getcap or getfattr to search inside an unpacked image or a running container filesystem. Example:
- docker run --rm -it --entrypoint sh IMAGE -c "find / -xattrname security.capability -printf '%p\n' 2>/dev/null"
- Or use getcap on known binaries: getcap /path/to/binary. If getcap lists non‑empty capabilities (especially inheritable bits), that binary is relevant.
- Process capability checks (live detection inside containers):
- Read /proc/<pid>/status and check the CapInh / CapPrm / CapEff fields; decode with capsh --decode or getpcaps. Example: cat /proc/1/status | grep Cap and decode with capsh. These hex fields reveal the process’s inheritable and permitted sets and let you confirm whether an unexpected inheritable set exists at process start.
- Runtime configuration audit:
- Look for any container start hooks, shim code, or orchestration layer that might explicitly set capability masks (cap-add/cap-drop flags) or modify process securebits. That code path could also be responsible for non‑empty inheritable sets if custom logic was applied.
Short‑term mitigations and vendor fixes
The fixes are straightforward and already distributed by vendors — but operational actions matter.- Podman: Upgrade to Podman v4.0.3 or later (the upstream patch explicitly clears inheritable capability arrays during container start). After upgrade, recreate containers launched by older Podman; existing container processes inherit the pre‑fix state and must be replaced.
- Moby / Docker Engine: Upgrade to Docker Engine / Moby 20.10.14 or later. Docker’s 20.10.14 release notes explicitly state an update of default inheritable capabilities and recommend stopping/deleting/recreating running containers so the corrected capability defaults are applied to new processes.
- If you cannot immediately patch, use runtime or entrypoint workarounds: modify container entrypoints to drop inheritable capabilities early in process startup (for example, using capsh or a small wrapper that clears inheritable bits). Docker and vendor advisories suggested capsh(1) as a practical workaround. Note this is a stopgap only — it does not change the runtime default and relies on image-level change.
- Recreate containers after patching: In both Moby and Podman cases, the corrected behaviour applies only to containers started after the fix. Running containers should be stopped, deleted, and recreated from fresh images after runtime upgrades to ensure capability sets are reset.
- Audit images: Remove unnecessary file capabilities from image binaries where possible. Prefer narrow capability use (capability-specific helpers or small privileged shims) rather than broad file capabilities chained through application images.
Detection, monitoring and forensic indicators
Because the bug depends on capability bits and execution flows, look for these signs:- Containers that contain binaries with file capabilities (getcap or find + getfattr reveals those files). Presence alone isn’t exploitation, but it’s the key enabler.
- Unexpected CapInh non‑zero values in /proc/<pid>/status for processes that should start with empty inheritable sets. Use capsh --decode and getpcaps to decode hex bitmasks into readable capability names.
- New or altered rootless privilege separation failures inside containers — e.g., services failing, unexpected privileged socket binds, or helpers gaining privileges they should not have.
- Repeated or mass re‑creation of child processes via exec patterns after file writes — complex exploitation chains could programmatically exercise execve(2) to elevate capabilities.
- Audit logs showing unexpected file writes to sensitive locations by non‑root container processes; such modifications could be the consequence of an escalated capability set enabling writes previously blocked.
Operational recommendations (prioritized checklist)
- Inventory: identify all hosts running Podman or Docker/Moby and list runtime versions. Flag Podman < 4.0.3 and Docker < 20.10.14.
- Patch: schedule and deploy vendor updates for Podman and Docker/Moby. Use vendor packages or vendor-supplied rebuilds; confirm fixes via package changelogs or commit IDs where possible.
- Recreate containers: stop, remove, and recreate running containers after patching to ensure inherited capability state is reset.
- Audit images for file capabilities and remove unnecessary setcap usages; prefer small, audited shims for privileged operations.
- Harden orchestration: review Kubernetes Pod Security Policies / admission controllers, Docker daemon configs, and Podman start flags to limit cap-add/cap-drop usage; avoid giving NET_ADMIN, SYS_ADMIN and similar broad capabilities to untrusted containers.
- Monitor: add capability checks into CI or image scanning — detect files with security.capability xattrs as part of image build pipelines. Integrate getcap scanning into container image QA.
- Educate developers: explain why file capabilities are powerful and risky; prefer capability-based daemons or minimal privileged helpers rather than building capabilities into general-purpose binaries.
Risk assessment and practical impact
CVE‑2022‑27649 and its Moby sibling are not dramatic remote‑code‑execution worms. They are a classic operational mistake that can be weaponized in local escalation chains. The worst practical outcomes include:- Local privilege escalation within a container to capabilities that let processes perform impactful actions inside the container (modify configs, bind privileged ports, change UIDs, etc.).
- Facilitation of broader compromise in multi‑tenant or CI environments where a tenant’s container can exploit the flaw to attack shared services or pivot if other misconfigurations exist.
- Operational surprises: administration scripts and images that assumed empty inheritable sets may behave incorrectly or leak sensitive diagnostics where privileged write paths open unexpectedly.
Broader lessons: capability hygiene and runtime design
This CVE underlines recurring themes in container and OS security:- Runtimes are not neutral transporters of a “default” environment; tiny changes in how userland state is initialized can break security assumptions downstream. Auditors and maintainers must treat runtime startup semantics as part of the threat model.
- Capabilities are powerful but subtle. The kernel’s capability rules are precise; userland code must be equally precise in initialising capability sets and sanitizing execution environments. Tools and language runtimes should include clear guidance for running in privileged contexts.
- Image scanning and build‑time QA should include capability scanning. The presence of file capabilities in an image should be treated as a security flag and require review and a documented justification.
Quick reference: commands operators can run now
- Check Podman version: podman --version (upgrade if < 4.0.3).
- Check Docker Engine version: docker --version (upgrade if < 20.10.14).
- Find files with file capabilities in an image or container: getcap -r /path || find / -xattrname security.capability -print (inside an unpacked image or container).
- Inspect process capabilities: cat /proc/<pid>/status | grep Cap and decode with capsh --decode HEXVALUE or use getpcaps <pid>.
- Recreate running containers after patching: docker stop <id> && docker rm <id> && docker run <image> ... (or podman equivalents).
Conclusion
CVE‑2022‑27649 and the companion Moby issue are instructive: a quiet initialization bug in a container runtime can invalidate assumptions made by images and applications, producing a local elevation-of-privilege path that is both avoidable and fixable. Vendors responded with small, targeted patches (Podman v4.0.3; Docker Engine / Moby 20.10.14), but the operational burden — inventory, patching, image audits and container recreation — is where real risk lingers.If you run containers in production, treat this as a hygiene and process story as much as a single CVE: enforce capability minimization, scan images for file capabilities during CI, and ensure runtime upgrades are part of your vulnerability lifecycle. Do the upgrade, recreate containers, and add capability checks to your build pipeline — those steps remove this risk and harden your deployment against similar surprises in the future.
Source: MSRC Security Update Guide - Microsoft Security Response Center