Rustls—the widely used, memory-safe TLS library written in Rust—contains a denial‑of‑service design flaw: under a specific, easily reproducible handshake sequence a blocking rustls server can enter an infinite loop inside rustls::conn::ConnectionCommon::complete_io(), consuming CPU and preventing the server from accepting or completing normal connections until restarted or otherwise recovered. Patches are available; operators and maintainers who use rustls in blocking server code must treat this as a high‑availability emergency and prioritize updates and mitigations immediately.
Rustls is a popular TLS implementation in Rust that aims to provide a safe, modern alternative to C‑based TLS stacks. Its performance, API ergonomics, and Rust safety guarantees have driven adoption across networking software, HTTP servers, reverse proxies, and embedded gateways.
In April 2024 a vulnerability was disclosed that targets the TLS handshake control flow. When a client sends a
The behavior is specific to blocking server paths rather than async acceptors in all configurations; however, many real‑world servers use blocking accept loops or integrate the blocking APIs indirectly, making the issue broadly impactful. Upstream maintainers released fixes in the rustls 0.21, 0.22 and 0.23 release lines; affected versions are those shipped before the patched releases for each line.
Key takeaways for teams:
Source: MSRC Security Update Guide - Microsoft Security Response Center
Background / Overview
Rustls is a popular TLS implementation in Rust that aims to provide a safe, modern alternative to C‑based TLS stacks. Its performance, API ergonomics, and Rust safety guarantees have driven adoption across networking software, HTTP servers, reverse proxies, and embedded gateways.In April 2024 a vulnerability was disclosed that targets the TLS handshake control flow. When a client sends a
close_notify alert immediately after the initial ClientHello, certain blocking server usage patterns cause the library’s handshake I/O loop—implemented in rustls::conn::ConnectionCommon::complete_io()—to spin forever rather than gracefully terminating or returning an error. The result is sustained CPU use, exhausted threads, and inability to accept or process legitimate connections: a classical availability failure with potentially severe downstream consequences for any product relying on affected rustls versions.The behavior is specific to blocking server paths rather than async acceptors in all configurations; however, many real‑world servers use blocking accept loops or integrate the blocking APIs indirectly, making the issue broadly impactful. Upstream maintainers released fixes in the rustls 0.21, 0.22 and 0.23 release lines; affected versions are those shipped before the patched releases for each line.
What the bug does — a technical summary
- The vulnerable routine is the core I/O completion loop used by rustls’ connection object: rustls::conn::ConnectionCommon::complete_io().
- The trigger is a specific TLS control sequence: a
close_notifyalert sent by a client immediately following theClientHellomessage during the handshake. - In blocking server usage, the
complete_io()logic can reach a state where it continually believes more I/O is needed but never progresses (an infinite spin), rather than returning control with an error or gracefully closing the connection. - The observed impact is pure availability loss: CPU and thread saturation and the inability to service new or existing requests. There is no indication the flaw leaks secrets, modifies data, or bypasses authentication—its primary impact vector is denial of service.
Who and what is affected
- Any application or service that:
- Embeds or links against affected rustls crate versions, and
- Uses rustls’ blocking server APIs (or otherwise drives ConnectionCommon::complete_io() in a blocking, per‑thread handshake path).
- The vulnerability spans multiple release lines; maintainers released patches in:
- rustls 0.21.x series (fixed in a 0.21.11+ release)
- rustls 0.22.x series (fixed in a 0.22.4+ release)
- rustls 0.23.x series (fixed in a 0.23.5+ release)
- Downstream packages, OS distribution packages, and embedded firmware that vendor or bundle rustls may carry vulnerable builds (for example, distribution packages prior to the fixed release numbers).
Why this is high risk for availability
- Remote, unauthenticated trigger: An attacker can send the crafted handshake sequence over a standard TLS port from anywhere on the network.
- Low complexity: The required input is a valid TLS handshake fragment plus an immediate
close_notify; it does not require cryptographic key knowledge or complicated timing beyond sending the messages in sequence. - Amplification through concurrency: A small number of crafted handshakes against a multithreaded or process‑pool server can exhaust worker threads and render the whole server unresponsive for legitimate traffic.
- Hard to detect at network perimeter: Because the attack is a valid TLS handshake sequence (albeit with an unusual ordering), network filters that rely on simple TLS detection may not flag it unless TLS inspection is in place.
Confirmed fixes and semantic scope
Upstream rustls maintainers corrected the underlying handshake loop behavior and released fixed versions across supported release lines. The canonical remediation is to upgrade to the first patched version in your release stream:- For 0.23.x users: upgrade to 0.23.5 or later.
- For 0.22.x users: upgrade to 0.22.4 or later.
- For 0.21.x users: upgrade to 0.21.11 or later.
Practical mitigation and remediation checklist
Follow these steps in order of speed and confidence to reduce exposure quickly, then permanently remediate by deploying patched libraries and binaries.- Immediate triage (minutes)
- Add emergency rate limiting for TLS accept connections (per‑IP and global) at the fronting load balancer or TCP proxy.
- If possible, place a short connection‑accept timeout at the TCP/TLS front end to limit the window for a hanging handshake.
- Monitor CPU and thread usage on TLS endpoint processes for sudden sustained spikes outside baseline.
- Short‑term mitigations (hours)
- Prefer terminating TLS at a well‑patched reverse proxy (NGINX, HAProxy, cloud load balancer) or hardware TLS terminator that is not using the vulnerable rustls binary. This moves the handshake surface away from the affected process.
- If you control the application, consider switching from blocking rustls APIs to the async acceptor or an acceptor API variant that is not known to traverse the same blocking loop (validate compatibility before switching in production).
- Patch and rebuild (days)
- Update your Cargo dependency references:
- Run cargo update -p rustls --precise <patched-version> (for example: cargo update -p rustls --precise 0.23.5)
- Rebuild and redeploy binaries that include rustls in their static artifacts.
- Update downstream crates that vendor rustls (for example, tokio‑rustls wrappers) to versions that depend on the patched rustls.
- For statically packaged applications, rebuild the package and replace deployed artifacts across hosts.
- Distribution and OS packages
- For systems that use distribution packages, install vendor security updates that include the patched rustls and reboot or restart affected services where required.
- Verify the runtime library version in production after updates (see Detection below).
- Post‑deployment hardening
- Implement graceful shutdown, backpressure, and per‑connection handshake timeouts in application code.
- Add instrumentation and alerts for abnormal handshake durations and handshake‑stage CPU loops.
- Ensure restart policies for TLS worker processes exist (systemd Restart=on-failure with suitable backoff), but do not rely solely on restarts as a mitigation.
Detection and incident response playbook
Detecting this attack requires focusing on availability and handshake anomalies rather than looking for malformed ciphertext:- Runtime indicators
- Sustained high CPU in TLS worker processes coincident with elevated TLS handshakes.
- Worker threads stuck in kernel time or spinning in user CPU during TLS accept.
- High counts of half‑open TLS sessions at the application level or elevated accept latency.
- Log signals
- Handshake logs that show a
ClientHelloimmediately followed by aclose_notify. - Repeated connection attempts from the same source IP sending short, failed handshakes.
- TLS library debug logs that report repeated wants_read/wants_write cycles for the same handshake without progress.
- Investigation steps
- Capture a pcap of the attack traffic and verify the message sequence (ClientHello then close_notify).
- Correlate with process-level metrics (top/htop, perf) and stack traces (gdb or rust backtrace in debug builds) to confirm the process is spinning in rustls I/O code.
- Isolate victims and apply mitigations above (rate limit, move TLS termination).
- Patch and redeploy; for affected services without immediate update capability, consider draining traffic to patched replacements.
- Containment
- Block or rate‑limit offending IPs at network perimeter while investigation proceeds.
- If attack originates from spoofed or distributed sources, implement global rate limiting and cloud DDoS mitigation services.
For developers: safe upgrade practices and dependency hygiene
- Check your dependency tree: run cargo tree -p rustls to see all consumers of rustls in your project tree. Pay attention to indirect dependencies (tokio‑rustls, web frameworks, cryptography wrappers) that may pin older versions.
- Bump rustls precisely and rebuild:
- cargo update -p rustls --precise 0.23.5
- cargo build --release
- For projects that vendor or bundle rustls (vendoring or static linking), rebuild binaries and redeploy; do not rely on the system package manager alone unless you verified the distro package is patched.
- If you ship container images, rebuild base images and application images; push updated images to registries and roll through orchestrator deployments with canary testing.
- Add automated dependency scanning (CI checks) to fail builds if rustls is present below patched versions.
Code‑level guidance for maintainers and library authors
This class of problem—handshake control flow mismanagement—can be reduced with several preventive engineering and testing practices:- Introduce deterministic timeouts and counters guarding handshake loops. If a handshake loop iterates beyond a reasonable threshold without progress, return an explicit error and close the connection.
- Harden state machine transitions for the handshake: make every state transition explicit and unit‑test sequences of messages that violate the nominal ordering (e.g., close_notify before handshake completion).
- Expand the fuzzing corpus for TLS handshakes with focus on unusual but valid control messages, early alerts, fragmented or reordered handshake fragments, and combinations of handshake alerts with ClientHello variations.
- Add integration tests that emulate blocking acceptor environments and verify the accept code path cannot enter indefinite spinning behaviors.
- If practical, favor non‑blocking or async I/O where architecture allows —non‑blocking state machines force clearer I/O boundaries and typically make infinite spin conditions easier to detect and limit.
How to prioritize remediation in complex environments
If you manage a large fleet or supply chain, use the following risk‑based approach:- Internet‑facing TLS endpoints that accept incoming connections are top priority. Patch, mitigate (rate limit / proxy offload), or temporarily route traffic through patched termination points.
- Internal services exposed to untrusted networks (e.g., partner networks, DMZ) are next. Consider network controls to limit exposure.
- Embedded appliances, gateways, and vendor hardware that bundle rustls: contact vendors and confirm patch schedules; apply vendor updates when available.
- Developer toolchains and CI runners: ensure they don’t produce new vulnerable artifacts (e.g., CI images that build and ship older rustls).
What defenders should expect in the wild
- Low sophistication attackers can exploit this issue at scale because the trigger is simple.
- Exploitation tends to be noisy but stealthy from an application‑level perspective because the handshake frames are legitimate TLS messages; conventional IDS that do not inspect TLS payloads may miss it.
- Expect distribution lists and package repositories to publish vendor advisories and distro updates. Operators should monitor vulnerability feeds and package updates and apply them rapidly.
Strengths and limitations of the fix
- Strengths
- Upstream fixes address the core state‑machine behavior and release across supported branches, enabling maintainers to upgrade within their chosen version cadence.
- The fix is corrective rather than mitigative—updating removes the root cause rather than merely constraining symptoms.
- Limitations and residual risks
- Binary artifacts and vendors that embed rustls in their product stacks may lag behind upstream; until packaged updates are applied, many deployed systems remain vulnerable.
- Systems that rely solely on process restarts as a stability measure may still be repeatedly victimized if the attacker continuously triggers the defect; mere restarts without addressing the inbound traffic pattern or applying fixes are inadequate.
- The patch addresses the specific infinite loop; similar handshake parsing or acceptor logic bugs may exist elsewhere (different acceptor APIs, other wrapper crates). A holistic review of handshake codepaths and fuzzing is recommended.
Recommended monitoring, observability and SRE actions
- Add a healthcheck that validates TLS acceptor responsiveness: periodically open a test TLS connection and complete a minimal request; alert if handshake latency or failure rate rises beyond thresholds.
- Track the following metrics and create alerts:
- TLS handshake success rate vs handshake attempts.
- Average and P95 handshake duration.
- Application thread pool saturation and worker queue length.
- CPU usage spikes on TLS‑handling processes.
- Instrument the handshake code path to emit structured logs when a connection advances handshake states or when it receives unexpected alerts (e.g., log on receiving early close_notify).
- Use live debugging tools (when safe to do so) to capture stack traces for spinning processes; a backtrace that includes rustls I/O functions is a strong indicator of this issue.
Final assessment and takeaways
CVE‑2024‑32650 is a classic example of how availability‑centric defects can hide inside otherwise memory‑safe libraries. Rust’s safety guarantees protect against whole classes of memory corruption, but application‑level protocol state machines still require careful, defensive design.Key takeaways for teams:
- Treat this as an availability emergency: prioritize identification of rustls usage, apply the upstream patches swiftly, and use immediate mitigations (TLS offload, rate limiting, timeouts) while rolling fixes.
- Verify both direct dependencies and indirect consumers; a vulnerable rustls version can be introduced transitively.
- Improve handshake testing and fuzzing to reduce the chance of similar state‑machine regressions in the future.
- Combine patching with observability improvements so incidents like this surface early and remediation is measurable.
Source: MSRC Security Update Guide - Microsoft Security Response Center