Microsoft’s August cumulative for Windows 11 24H2 (KB5063878) has been linked by independent testers and enthusiast communities to a reproducible storage regression in which certain NVMe SSDs can suddenly stop responding during sustained large writes, sometimes vanishing from Device Manager and Disk Management and — in a minority of cases — returning corrupted or unreadable data after a reboot. (support.microsoft.com)
The problem reported in mid‑August centers on a specific workload profile: sustained sequential writes on the order of ~50 GB or more, with device utilization climbing above roughly 60%. Under that load some SSDs reportedly lock up at the controller level, making the device invisible to the OS and often rendering SMART/controller telemetry unreadable. Reboots sometimes restore temporary visibility but do not guarantee the integrity of files written during the failure window. (igorslab.de, notebookcheck.net)
This incident echoes an earlier, related episode that began during the Windows 11 24H2 feature rollout (late 2024) when changes in Host Memory Buffer (HMB) allocation behavior exposed firmware weaknesses in certain DRAM‑less SSDs and produced persistent BSOD loops on a subset of Western Digital / SanDisk models. That earlier problem was mitigated by vendor firmware updates, registry workarounds and Microsoft rollout controls — establishing a pattern: subtle host‑side changes to storage behavior can trigger latent controller/firmware faults. (tomshardware.com, laptopmag.com)
At the same time, the deeper question of whether host platform (CPU/chipset/PCIe controller) faults contribute materially to the phenomenon remains unresolved and must be treated as a testable hypothesis rather than settled fact. Users and admins should prioritize backups, avoid heavy sequential writes on suspicious systems, keep firmware and BIOS up to date, and await coordinated advisories from SSD vendors and Microsoft before rolling the patch broadly in production environments.
The broader lesson is unchanged: modern storage depends on a fragile choreography between OS stacks, drivers and SSD firmware. When that choreography slips, the consequences are immediate and often irreversible for affected users — which makes conservative update management, robust backups and rapid vendor‑grade diagnostics more important than ever.
Source: Hardware Times NVMe SSD Randomly Disconnecting: Win 11 24H2 Update, Intel CPU/Chipset Responsible? | Hardware Times
Background / Overview
The problem reported in mid‑August centers on a specific workload profile: sustained sequential writes on the order of ~50 GB or more, with device utilization climbing above roughly 60%. Under that load some SSDs reportedly lock up at the controller level, making the device invisible to the OS and often rendering SMART/controller telemetry unreadable. Reboots sometimes restore temporary visibility but do not guarantee the integrity of files written during the failure window. (igorslab.de, notebookcheck.net)This incident echoes an earlier, related episode that began during the Windows 11 24H2 feature rollout (late 2024) when changes in Host Memory Buffer (HMB) allocation behavior exposed firmware weaknesses in certain DRAM‑less SSDs and produced persistent BSOD loops on a subset of Western Digital / SanDisk models. That earlier problem was mitigated by vendor firmware updates, registry workarounds and Microsoft rollout controls — establishing a pattern: subtle host‑side changes to storage behavior can trigger latent controller/firmware faults. (tomshardware.com, laptopmag.com)
What users and testers are seeing
Symptom profile (consistent community fingerprint)
- Large copy, game update, or backup operation proceeds normally and then abruptly fails or stalls near the ~50 GB mark. (tomshardware.com)
- The target drive disappears from File Explorer, Device Manager and Disk Management; vendor tools stop reading SMART/controller attributes. (notebookcheck.net)
- Reboot sometimes restores drive visibility; in some cases the volume/partition is gone or files written during the incident are corrupted. (igorslab.de)
- The fault appears reproducible in community lab tests under the specific heavy‑write workload, but not universal across every system or SSD of a given model.
Early‑reported trigger and reproducibility
Multiple independent reproductions place the reliable trigger window at sustained sequential writes of roughly 50 GB or more. That profile is common for game updates (Steam), bulk media transfers, disk clones, and large installer packages — which is why gamers and content creators surfaced many of the early reports. (notebookcheck.net, pcgamesn.com)Which SSDs appear in community lists
Community collations (user tests and regional tech sites) have produced overlapping but not identical lists of affected and unaffected models — a sign the failure is hardware‑ and firmware‑sensitive rather than a universal Windows fault. The following lists are culled from independent testers and community reports and should be treated as investigative leads rather than definitive recall lists.- Devices reported as affected in early aggregations (some recover after reboot; some became inaccessible):
- Corsair Force MP600 (Phison family)
- Phison PS5012‑E12 / related Phison family SKUs
- Kioxia Exceria Plus G4 (Phison‑based SKUs)
- Fikwot FN955 and various third‑party Phison‑based boards
- SanDisk Extreme Pro M.2 NVMe (in some reports)
- Other DRAM‑less or HMB‑reliant SKUs reported in Japanese community testing. (guru3d.com, igorslab.de)
- Devices commonly reported as not affected in the sampled lists:
- Samsung 990 PRO / 980 PRO series (no widespread reports)
- Certain Solidigm / Seagate enterprise NVMe models in community lists
- Some WD/Crucial high‑end models — but note: model variations and firmware levels matter. (notebookcheck.net, tomshardware.com)
Technical analysis — how this can happen
Why heavy sequential writes expose edge cases
Sustained sequential writes stress multiple layers simultaneously: application buffers, the Windows page cache, kernel I/O scheduling, the NVMe command stream and the SSD controller’s internal metadata management. A subtle host‑side change — timing, buffer sizing, DMA handling or HMB negotiation — can place a controller into an edge condition where firmware mishandles a sequence and effectively locks up. When the controller stops responding to admin commands the OS may treat the device as removed from the PCIe/NVMe topology. The symptom set — unreadable SMART, disappearance from Device Manager, corruption of in‑flight writes — is consistent with such a controller hang. (igorslab.de, support.microsoft.com)HMB and DRAM‑less controllers
The Host Memory Buffer (HMB) allows DRAM‑less NVMe SSDs to borrow a slice of host RAM for mapping structures and caches. Changes in how Windows assigns HMB — either size or policy — were the proximate cause for the earlier 24H2 BSOD wave on specific WD/SanDisk models. While current reports don’t uniformly pin HMB as the single root cause for the August 2025 regression, HMB remains a plausible cofactor, especially for DRAM‑less designs that depend on host memory stability and timing. (tomshardware.com, notebookcheck.net)Could Intel CPU/chipset PCIe be responsible?
Some users and writers have observed that moving an affected drive to a different platform (for example, an AM5 motherboard/CPU) eliminated the increase in data‑integrity errors and the disconnect events. That anecdotal evidence raises a plausible alternative hypothesis: a host PCIe controller or chipset regression — either in CPU‑integrated PCIe logic or the chipset — that produces an I/O profile the SSD firmware mishandles. Community threads document assorted PCIe lane/compatibility quirks on Z790 and other Intel platforms, and Intel’s Raptor Lake family previously experienced a known “Vmin / voltage‑related” instability class that required mitigations, firmware and microcode patches — demonstrating that CPU/platform anomalies can and do produce subtle I/O impacts. However, this remains conjecture at present: data tying KB5063878 specifically to Intel silicon faults is limited and unconfirmed by vendors. Treat the CPU/chipset theory as plausible but unverified until vendors or Microsoft publish confirmatory telemetry. (theverge.com, tomshardware.com)What vendors and Microsoft have (and haven’t) said
- Microsoft published KB5063878 on August 12, 2025; the official KB article lists the update and improvements and initially did not list a storage‑device failure as a known issue. The update page is the authoritative release record for the package. (support.microsoft.com)
- Independent enthusiast outlets and storage testers (Igor’s Lab, Guru3D, Tom’s Hardware, NotebookCheck and others) reproduced the event profile and aggregated affected model lists; vendor responses varied by manufacturer and by the earlier HMB episode — in many cases firmware updates and vendor dashboards were already the primary remediation path for the October 2024–era HMB failures. (igorslab.de, guru3d.com, tomshardware.com)
- Formal vendor statements linking KB5063878 to specific controller firmware revisions were limited at the time the community reporting emerged. That absence of a single vendor/Microsoft admission is a normal stage in a complex compatibility incident: community telemetry leads to vendor forensics, which may then produce firmware updates or a Microsoft Known Issue Rollback (KIR) if host‑side changes are at fault.
Practical guidance: triage and mitigation
The situation is workload‑sensitive and time‑sensitive. The following checklist synthesizes community recommendations, vendor practices and Microsoft servicing mechanics.- Immediate emergency steps (if you suspect an affected drive):
- Stop all heavy writes immediately. If a drive disappears mid‑transfer, further writes risk worsening corruption. (tomshardware.com)
- Back up critical data now to a separate physical device or cloud storage. Don’t rely on the suspect drive for backups.
- If the drive has become inaccessible but the data is critical, do not initialize/format the device. Power it down and, if possible, create a sector‑level forensic image to a safe target before further action. Imaging preserves recoverable data and supports vendor diagnostics.
- Firmware and tools:
- Launch your SSD vendor utility (WD Dashboard, Samsung Magician, Crucial Storage Executive, Corsair Toolbox) and verify firmware. If a vendor‑provided update fixes a known issue, apply it only after backing up. For older WD/SanDisk HMB problems, vendor firmware was the long‑term fix previously recommended. (tomshardware.com)
- If you’ve installed KB5063878 and want to avoid risk:
- Consider staging large writes on a different machine or temporarily withholding the cumulative in managed fleets until vendors confirm compatibility. Administrators should use WSUS/SCCM controls to test on representative hardware.
- Uninstalling a cumulative update is possible but has operational and security trade‑offs; the KB article documents removal mechanics for the combined SSU+LCU package and notes limitations. Balance the risk of data corruption against security exposure before rolling back. (support.microsoft.com)
- Registry and temporary mitigations (short‑term, not ideal):
- During the earlier 24H2 HMB episode some communities used the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\StorPort\HmbAllocationPolicy to limit or disable HMB allocation as a stopgap. That approach reduces performance and carries the usual registry‑editing risks; it is a temporary mitigation, not a substitute for firmware or official fixes. Only use such workarounds if you understand the trade‑offs and have current backups.
- If a drive appears to be failing repeatedly:
- Capture SMART and Kernel/Event logs prior to replacement; record the exact firmware, motherboard BIOS/UEFI version, Windows build and the KB(s) installed. This telemetry greatly speeds vendor diagnostics and RMA processing.
Longer‑term remediation and what to expect
- Vendor firmware updates remain the most likely definitive fix when controller firmware is the root cause. Historically, the coordination model is: community reproduces → vendor investigates → firmware patch → Microsoft may apply rollout blocks for vulnerable hardware until firmware is applied.
- Microsoft can also deploy Known Issue Rollbacks (KIR) or targeted servicing controls if host‑side NVMe or StorPort behavior is implicated. Expect official communications on the Microsoft Release Health dashboard if Microsoft or major vendors confirm the regression. (support.microsoft.com)
- If the CPU/chipset hypothesis gains traction (platform PCIe controller problems), remedies could include BIOS updates, CPU microcode/firmware patches or RMA for defective silicon in the worst cases. Past Intel Raptor Lake voltage/instability workstreams show that CPU/platform issues have required a mix of microcode, BIOS and replacement strategies when physical degradation or early‑life failures were suspected. That said, vendor confirmation is required before concluding the platform is the primary culprit. (theverge.com)
Critical appraisal — strengths, risks and what remains unverified
Notable strengths of the current response
- The community’s quick reproduction and sharing of workload‑profile data (e.g., the ~50 GB trigger) gives engineers a precise test case for forensic work. That reproducibility is a powerful accelerant for vendor mitigation. (igorslab.de)
- Vendors (Western Digital, SanDisk and others) have historically released firmware updates in response to the 24H2/HMB incidents; that track record suggests firmware remediation is possible and effective. (tomshardware.com)
Real risks and user impact
- The primary risk is data loss. Drives that disappear mid‑write and return corrupted metadata after reboot can produce unrecoverable loss for users who did not maintain independent backups. (tomshardware.com)
- Administrators face a tough trade‑off: delaying a security patch (to avoid storage risk) increases exposure to vulnerabilities; applying the patch risks a small but severe storage regression on certain hardware combinations. This is a classic update‑management dilemma. (support.microsoft.com)
Claims that should be treated cautiously
- The notion that Intel 13th/14th gen CPUs or Z790 chipsets are the primary cause for all observed drive failures is not yet established. Anecdotal moves to AM5 systems that halted error accumulation are suggestive but not conclusive. Confirming a CPU/chipset root cause requires vendor telemetry from the SSD controller makers, motherboard vendors and Microsoft. Until those parties publish a coordinated forensic finding, the CPU/chipset hypothesis remains plausible but unverified. (theverge.com)
- Model‑level lists are helpful but not decisive; firmware revision, SKU and motherboard/BIOS interplay materially change whether a given drive will reproduce the fault. Treat community model lists as investigation leads, not definitive blacklists. (notebookcheck.net)
For Windows power users and IT administrators — recommended steps (concise)
- Back up all critical data from systems that received KB5063878 immediately.
- Avoid running sustained large sequential writes (>~50 GB) on suspect systems until firmware/vendor guidance is confirmed. (guru3d.com)
- Check vendor utilities for firmware updates and apply them after backing up. (tomshardware.com)
- For fleets: stage KB5063878 in a test ring that includes representative storage hardware and test large‑write workloads before broad deployment.
- If a drive becomes inaccessible after the symptom, image it before reformatting and capture logs for vendor support/RMA.
Conclusion
The August 2025 cumulative update for Windows 11 24H2 (KB5063878) has surfaced as a serious compatibility risk for a small but consequential set of storage configurations: sustained heavy writes can trigger controller‑level failures that make NVMe devices disappear and, in some cases, cause file corruption or permanent inaccessibility. Community testing has provided a clear repro profile and model leads, and vendor/Microsoft remediation paths (firmware updates, rollout controls, registry stopgaps) are the established tools for resolution. (support.microsoft.com, igorslab.de)At the same time, the deeper question of whether host platform (CPU/chipset/PCIe controller) faults contribute materially to the phenomenon remains unresolved and must be treated as a testable hypothesis rather than settled fact. Users and admins should prioritize backups, avoid heavy sequential writes on suspicious systems, keep firmware and BIOS up to date, and await coordinated advisories from SSD vendors and Microsoft before rolling the patch broadly in production environments.
The broader lesson is unchanged: modern storage depends on a fragile choreography between OS stacks, drivers and SSD firmware. When that choreography slips, the consequences are immediate and often irreversible for affected users — which makes conservative update management, robust backups and rapid vendor‑grade diagnostics more important than ever.
Source: Hardware Times NVMe SSD Randomly Disconnecting: Win 11 24H2 Update, Intel CPU/Chipset Responsible? | Hardware Times