• Thread Author
Within days of Microsoft’s August 2025 Patch Tuesday, a cluster of independent testers and community posts began documenting a worrying pattern: after installing the Windows 11 24H2 cumulative update (tracked as KB5063878, with a related preview package KB5062660), some NVMe SSDs momentarily or permanently disappeared during large, sustained write operations — and in a subset of cases those drives returned with unreadable or corrupted data. (windowscentral.com)

Blue-lit scene showing an M.2 SSD in front of a large PCB with a glowing progress bar.Background​

How the problem surfaced​

The earliest widely circulated trace of the problem came from a Japanese X (formerly Twitter) user, who published methodical tests showing repeatable failures when writing large files (around 50 GB or more) to certain SSDs after the update. The pattern reported was consistent across multiple independent test benches and forum threads: during a sustained write, the drive would stop responding, vanish from File Explorer and Device Manager, and sometimes never reappear without vendor-level intervention. Reboots restored visibility for many drives, but files being written at the time were often corrupted or truncated. (tomshardware.com)

The timeline (concise)​

  • August 12, 2025: Microsoft publishes the cumulative update identified by the community as KB5063878 for Windows 11 24H2 (OS Build 26100.4946).
  • Mid-August 2025 (within days): Community test reports surface describing NVMe drives “vanishing” during sustained writes following the update. Multiple outlets replay and reproduce the symptom set. (windowscentral.com, bleepingcomputer.com)
  • August 18–20, 2025: Phison — a major NAND-controller supplier whose silicon is used across many branded SSDs — publicly acknowledges it has been made aware of the “industry‑wide effects” and is investigating with partners. Microsoft says it is “aware of these reports and investigating with our partners.” (bleepingcomputer.com)

Overview: what’s happening technically​

Symptom profile (what users actually see)​

  • Large, sustained sequential writes (commonly reproduced around the ~50 GB mark) proceed normally, then the target SSD becomes unresponsive mid-write.
  • The drive disappears from File Explorer, Device Manager, and Disk Management; SMART and vendor telemetry often become unreadable or return errors. (windowscentral.com)
  • Rebooting sometimes restores drive visibility (temporary fix), but files written during the failure may be truncated or corrupted; in a minority of cases the drive remains inaccessible and shows as RAW or uninitialized.

The likely technical intersection​

Modern SSD reliability depends on tight coordination between:
  • the operating system’s storage stack and buffering behavior;
  • the NVMe driver and host-side caching strategies; and
  • the SSD controller firmware, which manages wear-leveling, garbage collection, and internal caches (SLC/DRAM/HMB).
Community reproductions and vendor statements point to a workload-triggered interaction under sustained large writes that exposes a firmware edge case in certain controllers or an OS-side buffer management regression. Two host-side patterns have been discussed prominently:
  • drives that are heavily filled (commonly >50–60% used) exhibit higher failure rates, possibly because spare area and SLC cache windows are reduced;
  • DRAM‑less designs (which rely on Host Memory Buffer, or HMB) are more sensitive to host-side memory allocations and large sustained writes.
These technical mechanisms remain hypotheses derived from reproducible test patterns; definitive root-cause attribution requires vendor telemetry and coordinated forensics from Microsoft and SSD manufacturers.

Evidence and scope: what we know — and what we don’t​

What multiple independent tests repeatedly showed​

  • A consistent reproduction window: approximately 50 GB of continuous sequential writes is the point where failures were most frequently triggered in tests published by community labs.
  • Drives reported as affected were heterogeneous across brands, but several models using Phison controllers (including some DRAM‑less SKUs) were more frequently represented in early collations. That clustering prompted Phison to publicly investigate. (wccftech.com)

Which vendors and models came up repeatedly​

Community lists and specialist outlets named models from Corsair, Western Digital, SK Hynix, Adata, SanDisk and others in early reporting. However, these lists were assembled from hobbyist labs and user reports and are not an official recall or authoritative definitive list. Firmware revision, drive capacity, and host hardware materially affect reproducibility; therefore model lists are investigatory leads, not verdicts.

Vendor responses so far​

  • Phison: publicly acknowledged being made aware of potential industry-wide effects tied to KB5063878 and KB5062660 and said it was working with partners to review affected controllers and provide updates/advisories. (wccftech.com)
  • Microsoft: confirmed awareness and is investigating with partners; the company asked customers experiencing the issue to provide feedback through official channels. Microsoft’s public KB initially listed no storage-related known issues for the update, which contributed to the story’s rapid spread via community channels. (bleepingcomputer.com)

Technical deep dive: controllers, DRAM-less designs, and HMB​

SSD controllers and why they matter​

An SSD controller is the drive’s brain: it runs firmware that maps logical block addresses to physical NAND pages, manages wear leveling, garbage collection, and error correction, and controls caches. Differences in controller architecture — including the presence or absence of onboard DRAM — change how drives cope with sustained high-throughput I/O. Firmware can contain heuristics that assume certain host behavior; when host-side changes from an OS update violate those expectations, a fault may cascade into a controller-level lockup.

DRAM-less SSDs and Host Memory Buffer (HMB)​

Many cost-optimized SSDs omit onboard DRAM and instead allocate a portion of system RAM for mapping tables via HMB. That design is efficient, but it creates a dependency on host behavior. If an OS update changes the way HMB allocations are handled or if the OS’s buffered IO patterns create sustained pressure on mapping and cache paths, the controller may encounter states it did not anticipate. The 24H2 release earlier presented a precedent (Host Memory Buffer allocation changes causing BSODs for some WD drives), which lends plausibility to an HMB‑sensitive regression being a factor in this incident.

Why large, sustained writes are especially dangerous​

Sustained sequential writes push SSD controllers into extended use of SLC caches, mapping-table updates, and garbage collection operations. If free area is limited (drive is >50–60% full), the controller may have less headroom to accept bursts without invoking costly background operations. That increases the chance a latent firmware bug or mismatch in host/controller expectations will trigger a fatal lockup or crash. Community tests consistently show splitting large transfers into smaller chunks often avoids the failure — further evidence that the problem is workload-sensitive, not purely random hardware failure.

Risk assessment and critical analysis​

Notable strengths of the evidence​

  • Reproducibility: multiple independent labs and hobbyist testers reproduced the failure under similar parameters (large sequential writes, ~50 GB threshold). That strengthens the case for a systemic interaction rather than isolated hardware failure.
  • Vendor acknowledgement: Phison’s public statement and Microsoft’s confirmation of an investigation elevated individual reports into a coordinated industry issue. When a controller vendor engages, it typically means telemetry shows a non-trivial correlation. (wccftech.com)

Limitations and unresolved questions​

  • Incomplete public telemetry: neither Microsoft nor major SSD vendors have published a definitive, public root-cause analysis or a cleanly validated list of affected controller/family identifiers that would allow a simple “is my drive on the list?” check. This forces users to rely on community lists that may mix multiple firmware versions and BOMs.
  • Heterogeneous reproductions: while Phison controllers are overrepresented in reports, other controller families and even a few HDD reports have appeared in the wild. That heterogeneity complicates simple attribution to a single silicon design or firmware bug.
  • Statistical uncertainty: enthusiast and lab tests are purposeful and high-signal, but they are not a substitute for vendor telemetry across millions of devices. There’s still a possibility some incidents are coincidental hardware failures that cluster because of reporting bias. Responsible remediation requires vendor-side telemetry and coordinated fixes.

Potential systemic risks​

  • Data loss: the immediate, real-world risk is corrupted or lost files for users who performed large writes while patched. Unbacked data is at risk. (pcgamer.com)
  • Trust erosion: repeated update-triggered hardware regressions (even when rare) damage user confidence in automatic patching strategies and may drive risk-averse behavior that delays important security fixes.
  • Operational exposure for enterprises: organizations that patch broadly without representative staging rings are exposed to rare but severe failure modes across fleets that may use heterogeneous drives.

Practical guidance: what users and IT should do now​

Immediate actions for all users​

  • Back up critical data now using the 3‑2‑1 rule: three copies, on two different media, with one off-site. Do not assume a single backup is enough. (windowscentral.com)
  • Avoid large sequential writes (game updates, mass media transfers, archive extractions) on Windows 11 systems that installed KB5063878 or KB5062660 until vendors publish guidance. Splitting large transfers into smaller chunks or throttling the transfer rate has been reported to reduce failure probability in test reproductions.

Steps for more technically engaged users​

  • Inventory installed updates and SSD models and firmware versions. Use vendor tools to read controller firmware (e.g., Samsung Magician, WD Dashboard, Corsair Toolbox, ADATA SSD ToolBox).
  • If you suspect an incident has occurred, stop further writes to the affected drive immediately and power off the system to avoid additional damage. Preserve the drive for vendor diagnostics.
  • Create a forensic image of the drive before reformatting or attempting recovery if the data is valuable. Imaging tools like dd (careful with parameters), or vendor-provided imaging options, are recommended. Document serial numbers, firmware versions, OS build numbers, and event logs.
  • Contact SSD vendor support with logs and images; vendors will advise on firmware reflashes, low-level recovery steps, or RMA procedures. Expect that fixes will be delivered through vendor firmware updates after internal validation. (wccftech.com)

For system administrators and IT teams​

  • Pause broad deployment: stage the KB for a subset of representative hardware and monitor before rolling to production fleets. Use WSUS/SCCM or update-management tools to defer the patch pending vendor guidance.
  • Apply an inventory-driven risk model: identify endpoints with at-risk SSD models and prioritize them for rollback or temporary exclusion from the update ring.
  • Coordinate with storage and firmware vendors: vendors will likely issue firmware advisories or validated updates; plan for staged firmware rollouts and test firmware updates in a lab environment before production. (bleepingcomputer.com)

Recovery scenarios and troubleshooting checklist​

If a drive disappears during a transfer​

  • Stop activity immediately and avoid further writes to the affected drive. Continued writes increase the chance of permanent metadata loss.
  • Power off and image the drive if the data is important. Preserve the image and event logs for vendor analysis.
  • Attempt safe re-enumeration steps: reseat the drive, test on a different host, use a vendor utility to query SMART. If the drive reappears, copy critical data off immediately.
  • If the drive remains inaccessible, contact the SSD vendor for recovery or RMA; do not attempt low-level flashes that the vendor has not sanctioned, as unsupported operations can void warranty or complicate recovery.

When a drive reappears after reboot​

  • Treat any files written during the failure window as suspect. Run checksums or verify file integrity against known-good data if possible. Consider reformatting and reloading the drive after imaging and vendor consultation if the vendor recommends it.

What vendors and Microsoft should and likely will do​

  • Microsoft: expect coordinated forensics and, if required, a Release Health Known Issue entry for KB5063878 that either documents a mitigation or offers guidance for affected SKUs. In prior similar incidents, Microsoft has provided targeted mitigations and worked with vendors on fixes.
  • SSD vendors / controller makers: firmware patches or advisories will be the primary remediation vector. Phison’s public acknowledgment signals that vendor-supplied firmware or drive-level workarounds may be forthcoming, distributed through OEM and vendor update channels. (wccftech.com)

Broader implications and lessons​

Systemic fragility in modern storage stacks​

This incident underscores a perennial truth: modern storage is a co-engineered system where small host-side changes can expose latent firmware edge cases. The more tightly host and device cooperate (e.g., HMB), the greater the potential for fragile interactions. Organizations and enthusiasts must maintain disciplined staging and backup practices because even vendor-signed updates can produce surprising hardware regressions.

The communication imperative​

Timely, transparent vendor communication with clear lists of affected controller families and firmware versions is essential to avoid panic and unnecessary RMAs. Community testing is invaluable for rapid triage, but it cannot substitute for vendor telemetry and validated fixes.

Final assessment and actionable summary​

  • The pattern reported after KB5063878 (and related KB5062660) is credible and reproducible in controlled tests: sustained sequential writes of roughly 50 GB on moderately full drives can trigger failures where the drive disappears from the OS and sometimes returns with corrupted or unreadable data.
  • Phison and Microsoft have acknowledged investigations; Phison’s statement that it is reviewing potentially affected controllers indicates vendor-side remediation is likely to be the path forward. (wccftech.com)
  • Users should immediately back up critical data, avoid large sequential writes on patched systems, and stage updates in managed environments. If a drive has failed, power off, image the drive, collect logs, and engage vendor support rather than attempting unsupervised low-level repairs.
This remains an active, evolving situation: the community’s methodical test reproductions propelled vendor engagement, but definitive, vendor-validated root-cause analysis and firmware corrections are the only reliable path to full remediation. Until such fixes are published and validated, conservative safeguards — backups, staged deployments, and limited large-write activity on patched systems — are the pragmatic defenses against costly data loss.

Source: Lowyat.NET A Microsoft Windows 11 Update Is Bricking Some SSDs
 

Back
Top