South Africa’s Department of Justice and Constitutional Development (DoJ&CD) has confirmed a widespread outage after a recent Windows 11 security rollup created a system-level fault that rendered multiple departmental services inoperable, with restoration work expected to continue for days or weeks as the department coordinates remediation with Microsoft engineers.
The problem traces to Microsoft’s October cumulative update for Windows 11, identified by vendor documentation and incident timelines as KB5066835, which began shipping in mid‑October and quickly attracted reports of two high‑impact regressions: a kernel‑mode HTTP stack (HTTP.sys) regression that broke localhost (127.0.0.1) HTTP/2 connections, and a separate fault that made USB keyboards and mice unresponsive inside the Windows Recovery Environment (WinRE).
Those regressions manifested across both developer workstations and production endpoints. In practice, applications and administrative consoles that depend on local loopback connections, as well as recovery flows essential to device repair, experienced abrupt failures—symptoms that quickly escalated into operational outages for organisations with heavily Windows‑centric estates, including the DoJ&CD.
Why this matters: many modern apps (developer tooling, local admin panels, embedded web UIs inside desktop software and appliances) use loopback bindings (127.0.0.1 or ::1) for internal communications or local management. When the kernel listener resets those sessions before a user‑mode process can respond, the symptom looks like "the app is offline" even though the user process itself may be running fine. The result is broad functional collapse across disparate applications that share the same OS networking plumbing.
Source: IOL Justice and Constitutional Development services offline due to Windows system error
Background
The problem traces to Microsoft’s October cumulative update for Windows 11, identified by vendor documentation and incident timelines as KB5066835, which began shipping in mid‑October and quickly attracted reports of two high‑impact regressions: a kernel‑mode HTTP stack (HTTP.sys) regression that broke localhost (127.0.0.1) HTTP/2 connections, and a separate fault that made USB keyboards and mice unresponsive inside the Windows Recovery Environment (WinRE). Those regressions manifested across both developer workstations and production endpoints. In practice, applications and administrative consoles that depend on local loopback connections, as well as recovery flows essential to device repair, experienced abrupt failures—symptoms that quickly escalated into operational outages for organisations with heavily Windows‑centric estates, including the DoJ&CD.
What Microsoft’s patch changed — the technical anatomy
HTTP.sys, localhost and the HTTP/2 handshake
At the centre of the outage is HTTP.sys, the kernel‑mode HTTP listener that Windows uses to accept and negotiate incoming HTTP traffic for IIS, IIS Express, and any process that registers URL prefixes with the kernel. The October rollup introduced changes that, in some configurations, mishandled HTTP/2 negotiation over the loopback interface, causing connections to be reset during the protocol handshake and preventing user‑mode servers from receiving requests. The visible client errors included ERR_CONNECTION_RESET and ERR_HTTP2_PROTOCOL_ERROR.Why this matters: many modern apps (developer tooling, local admin panels, embedded web UIs inside desktop software and appliances) use loopback bindings (127.0.0.1 or ::1) for internal communications or local management. When the kernel listener resets those sessions before a user‑mode process can respond, the symptom looks like "the app is offline" even though the user process itself may be running fine. The result is broad functional collapse across disparate applications that share the same OS networking plumbing.
WinRE regression: recovery tools rendered unusable
The second regression affected WinRE, a minimal "safe OS" used for offline diagnostics and repair. The October update replaced or altered Safe OS components on some machines in a way that prevented USB host controller or USB driver initialization within the recovery image. The practical consequence: USB keyboards and mice stopped working inside WinRE, blocking on‑device recovery workflows and leaving many systems effectively unrepairable without physical intervention or pre‑staged alternate recovery media. Microsoft acknowledged the problem and later shipped an out‑of‑band fix intended to restore WinRE input functionality.Timeline — how the incident unfolded
- October 14, 2025 — Microsoft publishes the October cumulative update for Windows 11 (KB5066835).
- Mid‑October — Community reports and enterprise helpdesks begin reporting localhost loopback failures and WinRE input loss. Symptoms appear as HTTP/2 connection resets and unresponsive USB input in recovery.
- Microsoft posts known issues to its Release Health dashboard and starts triage; organisations report operational impacts.
- Microsoft deploys remediation channels: a Known Issue Rollback (KIR) for some HTTP.sys regressions and an out‑of‑band cumulative update (e.g., KB5070773) to address the WinRE USB input regression. Enterprises also received registry mitigation guidance to force HTTP/1.1 behavior for loopback connections as a temporary fix.
- October 22, 2025 — South Africa’s DoJ&CD issues a media statement confirming its operations were affected and that it is working with Microsoft engineers to restore services.
Impact on Department of Justice operations — practical consequences
When a justice department’s desktops, case‑management terminals, or local middleware rely on loopback services for authentication, document signing, or embedded administrative UIs, the failure of localhost connectivity can cascade into immediate service interruption. The reported operational impacts for DoJ&CD—based on the department’s media statement and incident summaries—include:- Delays issuing court documents, warrants, and letters of authority.
- Disruption to electronic bail and remand processing.
- Temporary suspension or slowdown of online filing and case‑management interfaces.
- Impaired internal email gateways or identity callbacks where local services act as intermediaries.
Microsoft’s response: KIR, out‑of‑band patches and guidance
Microsoft moved quickly to acknowledge the regressions on its Release Health pages and used several remediation mechanisms:- Known Issue Rollback (KIR): a server‑side rollback technique that reverses specific changes for many customers without requiring a full uninstall of the cumulative update. KIR is useful when the regression is tied to a single code change that can safely be reversed at scale.
- Out‑of‑band cumulative update (emergency patch): Microsoft issued an OOB update (reported in vendor timelines as KB5070773) to address the WinRE USB input regression and include other urgent fixes. This patch was distributed through Windows Update and Microsoft Update Catalog channels.
- Interim mitigations: Microsoft documented temporary workarounds—such as disabling HTTP/2 for loopback via registry keys to force HTTP/1.1—while advising administrators to apply vendor fixes or KIR where available. These mitigations carry trade‑offs and must be tested before enterprise deployment.
Mitigation and remediation: recommended steps for IT teams
For organizations currently affected or seeking to harden against similar incidents, the following prioritized checklist condenses practical, defensible actions:- Inventory and triage
- Identify all Windows 11 and Windows Server endpoints running the 24H2/25H2 servicing branches and confirm whether KB5066835 (or later cumulative updates) is installed.
- Apply vendor fixes first
- Check Microsoft Release Health and apply Known Issue Rollback (KIR) or the designated out‑of‑band cumulative updates via your management channels (WSUS/Intune/SCCM) as soon as they are available. Prefer managed KIR packages over ad‑hoc uninstalls where possible.
- Use controlled mitigations where necessary
- If a KIR or OOB fix is unavailable, consider controlled, temporary mitigations such as disabling HTTP/2 for loopback (registry toggles EnableHttp2Tls = 0 and EnableHttp2Cleartext = 0 under HKLM\SYSTEM\CurrentControlSet\Services\HTTP\Parameters) in test rings first. Reboot and validate behavior before broader deployment. This is a stopgap and not a replacement for vendor patches.
- Restore recovery paths
- For WinRE issues, install Microsoft’s OOB fix immediately; where immediate patching isn’t possible, ensure alternative recovery media (bootable WinPE images with correct USB drivers) and secure access to BitLocker recovery keys.
- Validation and documentation
- After remediation, perform end‑to‑end tests on representative hardware (IIS sites, local developer tooling, vendor embedded UIs, and WinRE). Record actions, timestamps, and test results for audit and post‑incident analysis.
Strengths shown and systemic weaknesses exposed
What worked well
- Rapid vendor action: Microsoft publicly acknowledged the regressions and deployed multiple remediation channels (KIR and an out‑of‑band cumulative update) faster than a traditional monthly cadence would allow. This accelerated restoration for many customers.
- Transparent public communication by DoJ&CD: The department’s quick public acknowledgement that the outage was vendor‑related helped manage stakeholder expectations and opened a direct remediation channel with Microsoft engineers.
What failed or is risky
- Single‑vendor dependency: The incident underscores how a regression in a widely used platform component can create concentrated risk for public services that depend heavily on one OS vendor’s update pipeline. When a kernel driver regresses, the impact can cascade across unrelated apps.
- Inadequate canary testing against recovery flows: The WinRE regression demonstrates that recovery must be tested as a first‑class concern; an update that impairs recovery tools multiplies operational risk.
- Heterogeneous estate complexity: Large government estates with varied OEM drivers, EDR products, and custom integrations are more prone to edge‑case failures that escape vendor QA and cause uneven remediation timelines.
Broader policy implications for government IT
This outage is more than a technical incident; it is a governance exercise in platform risk management. Public‑sector IT leaders should convert the immediate lessons into institutional changes:- Mandate staged canary rings and test matrices that explicitly include WinRE/Safe OS assets, local loopback scenarios, and common vendor management consoles.
- Build contractual SLAs with major platform vendors that specify emergency remediation timelines and engineer escalation routes for kernel‑level regressions.
- Maintain resilient manual or analog fallback procedures for essential public services (paper workflows, delegated manual sign‑off processes) that can be activated for the narrow set of functions that cannot tolerate downtime without legal or constitutional consequences. The DoJ&CD’s media note rightly flags restoration will take days to weeks—agencies must plan for that contingency.
What remains uncertain — claims requiring caution
Public reporting and vendor notes provide a clear technical arc, but several specifics remain unverified in public disclosures:- Precise inventory counts of affected DoJ&CD endpoints and which exact subsystems (case management, email gateways, document signing) were knocked offline were not published in the department’s statement. That level of detail is typically internal during active remediation and will be necessary for a formal post‑incident assessment. Treat any numeric estimates as provisional.
- The definitive line‑by‑line root cause analysis at the binary level (for example, a single HTTP.sys subcomponent change) has not been published by Microsoft; community analyses converge on HTTP.sys and WinRE driver changes, but a full, public post‑mortem naming code paths and fixes will be required for forensic closure.
Takeaways for IT leaders and administrators
- Treat OS updates as major change events, not routine housekeeping. Build patch windows that allow for representative‑hardware validation and an explicit recovery‑path exercise.
- Preserve and rehearse alternative recovery options (bootable media, offline images, PS/2 fallbacks) and ensure BitLocker keys are accessible during incidents.
- Prefer vendor‑sanctioned rollback mechanisms (KIR) and out‑of‑band fixes over broad uninstall strategies when possible; test these in canary rings before wide deployment.
- Maintain clear incident playbooks including communications templates for stakeholders and statutory authorities—justice systems have deadlines and legal processes that need explicit handling when digital systems go offline.
Conclusion
The DoJ&CD outage is a high‑visibility example of how platform regressions at the kernel level can convert a routine security update into a multi‑day operational crisis for public services. Microsoft’s rapid remediation actions—public acknowledgement, Known Issue Rollback, and an out‑of‑band patch—substantially reduced harm for many customers, but the incident spotlights enduring fragilities in vendor dependency, recovery validation, and patch‑management discipline for mission‑critical estates. Governments and enterprises must treat recovery and loopback scenarios as first‑class test cases, build robust canary rings, and negotiate stronger remediation commitments with major platform vendors to prevent a single update from taking courtrooms, case‑management systems, and other essential public services offline in the future.Source: IOL Justice and Constitutional Development services offline due to Windows system error
