• Thread Author
New Windows 11 24H2 cumulative updates released in August have been linked to a worrying, reproducible storage regression that can make some SSDs and a few HDDs vanish from the operating system during heavy, sustained writes — in a number of community tests the failure has led to corrupted or inaccessible data and, in at least one test run, an unrecoverable drive.

Close-up of a motherboard with multiple illuminated NVMe SSDs under blue LED lighting.Background / Overview​

The incident centers on the August 2025 Windows 11 cumulative/security packages commonly tracked as KB5063878 (the monthly LCU plus SSU for Windows 11 24H2) and an associated preview update KB5062660. Within days of their deployment, multiple independent testers and enthusiast outlets began reproducing an identical failure fingerprint: when a target drive is subject to sustained sequential writes — typically in the tens of gigabytes — that drive can stop responding, disappear from File Explorer/Device Manager/Disk Management, and return unreadable SMART/controller telemetry. Some drives recover after a reboot; others do not. (bleepingcomputer.com, notebookcheck.net)
Phison, a major SSD controller designer whose silicon is used in many low-cost and mainstream NVMe designs, has publicly acknowledged it is investigating reports that the KB5063878 and KB5062660 updates “potentially impacted several storage devices,” and said it is working with partners to identify affected controller families. Microsoft has not issued a full public technical breakdown at the time of reporting; vendor and community telemetry are still driving the analysis. (bleepingcomputer.com, wccftech.com)

What users and testers are reporting​

Symptom fingerprint​

  • Large, sustained sequential writes (game installs/patches, archive extraction, cloning, or bulk media copies) that begin normally then fail or stall after a period of activity.
  • The target drive becomes invisible to the OS: it disappears from File Explorer, Device Manager, Disk Management, and diagnostic tools.
  • SMART and controller telemetry may be unreadable or report errors after the event.
  • Reboot sometimes restores visibility; in other cases the drive remains inaccessible and files written during the incident are corrupted or lost. (tomshardware.com, notebookcheck.net)

Repro trigger and workload profile​

Independent, community-led reproductions converged on a consistent trigger profile: sustained sequential writes in the ballpark of ~50 GB or more, and an elevated likelihood of failure when the drive is more than ~60% full. Tests conducted and shared publicly show the fault appears under continuous write pressure that exercises controller caching and metadata paths. Splitting the workload into smaller batches sometimes avoids the fault. These behavioral characteristics point at a workload-dependent edge case rather than an instant, universal hardware failure. (tomshardware.com, notebookcheck.net)

Early device-level results​

One well-publicized community test sequence evaluated 21 different SSDs under a sustained-write workload and reported that 12 of them demonstrated the disappearance behavior; one model in that set — reported as a Western Digital SA510 2TB in those tests — could not be recovered in the tester’s environment. Not every drive of a flagged model showed the issue in all test rigs, and several mainstream high-end models were reported as unaffected in the same bench runs. That heterogeneity complicates a single-vendor attribution and suggests the root cause is likely an interaction between host behavior and controller firmware.

Technical anatomy: plausible mechanisms​

Modern NVMe SSDs are embedded systems: controller firmware, optional on-board DRAM, NAND flash channels, and the host OS/driver stack must all coordinate precisely. Two non‑exclusive mechanisms are the leading working hypotheses based on public testing and historical precedent.

1. Host-driven NVMe command / buffering regression​

A small change in Windows’ kernel, NVMe driver, or buffer-handling logic (for instance in how the OS stages page-cache-backed writes and issues NVMe commands) can alter command ordering, DMA timing, or buffer lifetimes in ways that exercise latent controller firmware bugs. If a controller receives commands or memory mappings in a cadence it cannot tolerate under heavy load, it may hang or become unresponsive — which the OS interprets as device removal. The observed unreadable SMART/controller telemetry and the drives disappearing mid-write are consistent with a controller-level hang. (tomshardware.com, borncity.com)

2. HMB / DRAM‑less controller fragility and metadata pressure​

Many cost-optimized SSDs are DRAM-less and rely on Host Memory Buffer (HMB) to borrow host RAM for mapping tables and caching. Sustained sequential writes stress the drive’s flash translation layer (FTL) and mapping structures; subtle changes in the host-side HMB allocation or timing — or in the OS’s handling of the page-cache / buffered writes — can surface race conditions and resource exhaustion in controller firmware. Earlier Windows 11 24H2 rollouts previously exposed HMB-related fragility on certain models, making HMB/DRAM-less interactions a plausible contributory factor again. (borncity.com, notebookcheck.net)
Caveat: both mechanisms are plausible and not mutually exclusive. Community data shows drives built around certain Phison controller families are over‑represented among repros, but non‑Phison devices and even a few HDDs have appeared in isolated cases — pointing toward a host–firmware interaction rather than a single vendor hardware recall.

Timeline and vendor response​

  • August 12, 2025 — Microsoft published the combined servicing stack update and latest cumulative update for Windows 11 (tracked as KB5063878 / OS Build 26100.4946). The initial Microsoft KB did not list a storage-device regression.
  • Within days — community testers and specialist outlets began publishing reproducible test cases where drives would disappear mid‑write; user-sourced lists and lab results appeared on forums and in tech outlets.
  • Mid‑August — Phison posted an acknowledgement that it had been “recently made aware” of industry‑wide effects tied to KB5063878 and KB5062660 and said it was working with partners to review potentially affected controllers. That message focused on partner advisories and coordinated firmware remediation. Microsoft had not publicly released its own detailed analysis at the time community reports emerged. (bleepingcomputer.com, wccftech.com)
Phison’s public posture — cooperative investigation, partner‑first firmware distribution — is standard for a controller supplier because fixes must be validated against each branded SSD’s bill of materials, factory configuration, and vendor update mechanisms before broad release. Nonetheless, that process takes time and prolongs exposure for end users in the absence of an OS-level mitigation.

Cross‑checking the key claims (verification)​

  • The association between KB5063878/KB5062660 and disappearing drives is reported by multiple independent outlets and reproduced by community testers; BleepingComputer and Tom’s Hardware document the issue and Phison’s acknowledgement. (bleepingcomputer.com, tomshardware.com)
  • The reproducible trigger profile (~50 GB sustained writes and ~60% used capacity) is consistent across community test logs and reporting from NotebookCheck and Born’s Tech & Windows World. These independent reproductions strengthen the hypothesis that the regression is workload-dependent. (notebookcheck.net, borncity.com)
  • The specific test counts (21 drives tested, 12 showing the behavior, and one WD SA510 2TB unrecoverable) appear in community test reporting highlighted by Tom’s Hardware and referenced by other outlets; this is an important data point but remains a single tester’s dataset and should be treated as an initial sample rather than a statistically comprehensive study.
Flagging unverifiable elements: some early user hypotheses — for example, claims of a Windows “memory leak” in the OS-buffered cache region or an exact causal line to a specific Phison controller family — remain speculative until vendors release telemetry-backed root-cause analyses. Community test logs are valuable but not definitive without vendor telemetry and forensic dumps. Treat early attributions with caution. (easeus.com, windowscentral.com)

Practical impact: who’s at risk?​

  • Consumers and gamers with DRAM‑less or HMB‑reliant SSDs, particularly cheaper M.2 NVMe models using Phison controllers, appear to be at higher observed risk in community reproductions. However, the phenomenon is not strictly limited to DRAM-less or Phison designs. (notebookcheck.net, wccftech.com)
  • Power users who routinely perform large sequential writes (video producers, game library installers, disk-cloning operations, bulk backups) are most likely to encounter the bug because their workloads match the repro profile.
  • Managed IT environments should be especially cautious; the same Windows update also produced separate deployment problems (WSUS/SCCM error 0x80240069) that required Microsoft servicing controls in parallel. Enterprises must inventory affected hardware and stage updates conservatively. (borncity.com, windowsforum.com)

Recommended immediate actions​

These are practical, conservative steps until vendors or Microsoft publish validated fixes and deployment guidance.
  • Back up critical data now — use the 3-2-1 rule (three copies, two different media, one off-site) for essential files. Data integrity is the primary risk.
  • Avoid sustained, large sequential writes on Windows 11 systems that received KB5063878 or KB5062660. Split large transfers into smaller batches, and avoid write-heavy operations (game installs, large archive extractions, cloning) until a remediation is confirmed.
  • Inventory SSD models and controller families across your fleet. Note firmware versions and whether the drives are DRAM-less / HMB‑dependent. That inventory will help prioritize firmware testing and staged rollouts.
  • Delay non‑critical deployment of the August 2025 cumulative (KB5063878) in managed environments until vendor advisories and Microsoft guidance stabilize. Use a conservative ring-based rollout for production systems.
  • Monitor vendor dashboards (SSD manufacturer support pages) for firmware updates and validated release notes; firmware patches will most likely be distributed through OEM/vendor channels and must be tested per SKU. Phison indicated fixes will be provided to partners rather than direct-to-consumer. (wccftech.com, notebookcheck.net)

Mitigations, recovery, and what to expect from vendors​

  • Firmware updates: If forensic analysis points to controller firmware not handling valid OS command sequences, SSD vendors will release firmware updates for affected SKUs after partner validation. Those updates are standard remediation for firmware-exposed regressions, but distribution and validation add time.
  • Windows-side mitigation: If Microsoft’s telemetry shows an OS-side regression is the primary trigger, Microsoft may deploy a Known Issue Rollback (KIR), a targeted hotfix, or a selective servicing block while a fix is prepared — a path Microsoft has used previously for urgent regressions. A combined approach (vendor firmware + OS mitigation) is also possible.
  • Drive recovery: In many reported cases a reboot restored the drive’s visibility, but files written during the failure window are at high risk of data loss. In a minority of reports the drive remained inaccessible and required vendor recovery tools or deeper remediation. In rare, community-reported cases a drive was unrecoverable in the tester’s environment. Those outcomes underscore the primacy of immediate backups and conservative triage. (tomshardware.com, notebookcheck.net)

Risks beyond individual users​

The incident surfaces a broader ecosystem risk: OS updates are platform-level events that can expose latent firmware bugs across a wide population of devices. When the host OS or driver behavior changes subtly, the fault is not always limited to one controller or vendor. That systemic coupling means:
  • Large organizations may face simultaneous incidents across many endpoints if they deploy updates broadly without staging and representative hardware testing.
  • The time lag between identifying affected controller families, validating firmware per SKU, and deploying safe updates creates a window of exposure for users who install the OS patches immediately.
  • Public confusion and inconsistent vendor messaging can amplify risk: community lists are useful triage tools but are not a substitute for vendor-validated compatibility matrices.

What this episode reveals about testing and coordination​

There are two operational lessons that follow directly from this incident:
  • Representative pre‑release testing must include real-world heavy‑write workloads across a diverse sample of SSD controller families, including DRAM-less parts that rely on HMB. Synthetic benchmarks often miss long-duration metadata and mapping stress patterns that reproduce in real user workloads.
  • Faster, telemetry-driven vendor–platform collaboration is essential. When an OS vendor changes buffer allocation or driver timing, clear telemetry sharing between Microsoft and controller vendors enables directed fixes (firmware, driver, or OS mitigation) rather than prolonged guesswork.

How to communicate this to an IT posture / risk register​

  • Prioritize backups and snapshot cadence for systems with at-risk SSDs.
  • Flag Windows 11 KB5063878 (and KB5062660) in change advisory boards as a conditional hold item pending vendor guidance.
  • Create a triage runbook: identify affected SKUs, capture firmware and UEFI versions, and establish a test harness that reproduces the ~50GB+ sequential write profile to validate fixes before mass deployment.
  • Track vendor advisories and Microsoft KB updates for KIR or targeted hotfixes.

Final assessment and outlook​

Current public evidence — combined community reproductions, specialist outlet reporting, and Phison’s partner-focused acknowledgement — converges on a workload-dependent storage regression tied temporally to the August Windows 11 cumulative updates KB5063878 and KB5062660. The most credible technical explanations point to an interaction between host-side buffering/HMB allocation and controller firmware under sustained sequential write pressure; that interaction can cause a controller to hang and the OS to treat the device as removed. Cross-checks across at least two independent outlets and multiple community reproductions back up the central facts of the incident, though full root-cause attribution awaits consolidated vendor and Microsoft telemetry. (tomshardware.com, bleepingcomputer.com)
Short‑term user guidance is simple and urgent: back up, avoid heavy single-shot writes on updated systems, inventory your SSDs, and wait for vendor-validated firmware or Microsoft mitigations before resuming typical large‑write workflows. For enterprise administrators, stage updates conservatively, increase snapshot and backup frequency, and validate fixes on representative hardware before fleet-wide deployment. (windowscentral.com, wccftech.com)
This episode is not merely a single‑update glitch — it is a practical reminder that modern storage reliability depends on finely balanced host–firmware interactions and that even security-focused updates must be validated against the broad, heterogeneous hardware ecosystem they serve. Until vendors and Microsoft publish coordinated, verified fixes, the safest posture is conservative: protect data first, then restore performance‑heavy workflows after validated remediation is in place.

Conclusion: Treat the Windows 11 KB5063878/KB5062660 storage regression as a live compatibility incident with real data‑integrity consequences. Back up immediately, avoid sustained large writes on patched systems, inventory at‑risk hardware, and await vendor-validated firmware or Microsoft mitigations before resuming heavy write workloads. (tomshardware.com, bleepingcomputer.com)

Source: extremetech.com Phison Tackling Windows 11 24H2 SSD/HDD Failure Issue
 

Back
Top