Windows 11 KB5063878 storage regression: NVMe disks vanish; Phison investigates

ChatGPT · Aug 19, 2025

Phison has confirmed it is investigating reports that Microsoft’s August cumulative for Windows 11 (24H2) — distributed as KB5063878 — is associated with a storage regression that can make certain NVMe SSDs stop responding or disappear from Windows during large, sustained write operations.

Background / Overview

The issue surfaced publicly in mid‑August 2025 after several independent community testers and enthusiast outlets reproduced a consistent failure profile: during sustained sequential writes (community reproductions commonly cite a threshold around 50 GB), some NVMe drives become unresponsive, vanish from File Explorer and Device Manager, and present unreadable SMART/controller telemetry. In some cases a reboot temporarily restored visibility; in others the drive remained inaccessible and files written in the failure window were damaged or missing.
Two related but distinct incidents have been discussed in parallel by testers and vendors. First, earlier in the 24H2 rollout some Western Digital (WD) and SanDisk drives produced repeated BSODs tied to Host Memory Buffer (HMB) allocation changes, leading vendors to issue firmware updates and guidance. Second, the August cumulative KB5063878 is the package now specifically correlated with the “drive disappears during heavy writes” regression reported across multiple independent test benches.
Community threads and aggregated test logs show an over‑representation of drives using certain Phison controller families among the affected devices, although reproductions have also implicated models using other controllers — indicating the root cause may be an interaction between host/OS behavior and controller firmware rather than a single‑vendor-only defect.

Timeline: How the reports unfolded

August 12, 2025 — Microsoft released KB5063878 (OS Build 26100.4946) as the August cumulative update for Windows 11 24H2. The public KB initially listed standard security and quality fixes and did not call out a storage‑device regression.
Within days — hobbyist testers and several tech outlets began publishing reproducible test cases where drives would vanish during large, continuous writes; the community commonly observed the issue when transferring tens of gigabytes in a single operation.
Mid‑August — Phison issued a concise acknowledgement that it had been “recently made aware” of the effects of KB5063878 (and KB5062660 in some reporting) and that it was working with partners to review controllers that might have been impacted. The language committed Phison to ongoing advisories while avoiding definitive root‑cause attribution.
Parallel enterprise symptom — administrators deploying the update via WSUS/SCCM also reported unrelated installation errors (0x80240069) that forced Microsoft to issue Known Issue Rollback (KIR) mitigations for managed environments.

What’s the technical fingerprint? Symptoms, trigger profile, and reproducibility

Symptoms observed by testers

Sudden disappearance of an NVMe SSD from File Explorer, Device Manager, and Disk Management while a write is in progress.
Event Viewer entries near the failure showing NVMe or storage controller errors and sometimes kernel I/O failures.
Controller and SMART telemetry becoming unreadable to vendor tools after the event; in a subset of cases, files written during the operation are corrupted, truncated, or missing after a reboot.

Typical trigger profile

Community reproductions converge on sustained sequential writes — large file copies, archive extractions, game installs/updates, or cloning tasks — where the drive is pushed to write tens of gigabytes in one continuous operation. Many testers report consistent failures around the ~50 GB mark, though the threshold can vary by model, firmware revision, and host platform.

Which drives are more likely to show the fault?

Early collations point to an over‑representation of SSDs built around Phison controller families (examples cited across reports include PS5012‑E12 and later E21T/E31T families) and to multiple DRAM‑less or HMB‑reliant designs. However, the problem has not been confined strictly to Phison; several non‑Phison controllers and even a handful of HDDs appeared in isolated repros, which strongly suggests an OS/host/driver interaction is a major cofactor. Treat model lists as investigative leads rather than definitive recall lists.

The Phison response — what was said and what it means

Phison issued a measured public statement acknowledging it was “recently made aware of the industry‑wide effects” of KB5063878 (and in some reports, KB5062660) and said it had “promptly engaged industry stakeholders,” that controllers “under review” were being worked on, and that Phison would “continue to provide updates and advisories to partners.” The statement emphasizes coordination with partners and support for remediation, but stops short of assigning definitive blame to either Windows or specific Phison firmware revisions.
This kind of messaging is consistent with how controller vendors typically respond to cross‑ecosystem regressions: confirm awareness, work privately with platform and OEM partners using telemetry, and then publish targeted advisories or firmware updates once root cause is established. The measured language reflects a need to avoid premature public attribution until vendor and Microsoft telemetry align.

How Microsoft and other vendors have acted so far

Microsoft: At the time community testing was published the KB page for KB5063878 did not immediately list a storage‑device regression as a known issue. Microsoft has tools to control update delivery (Known Issue Rollback and targeted blocks) and used these previously when enterprise deployment errors appeared. Public communications from Microsoft typically follow internal telemetry confirmation; community reports often precede that confirmation.
Western Digital / SanDisk: WD and SanDisk previously released firmware updates to address BSODs tied to HMB allocation during the earlier 24H2 rollout. Those vendor patches were effective for the earlier BSOD symptom set and remain the recommended step for affected models in that cluster of failures. Vendors have historically issued firmware updates when controller logic needs adjustment for changed host behavior.
Phison: Acknowledged investigation and engagement with partners; pledged updates to partners and support channels while controllers are reviewed.

Independent outlets that reproduced the issue have urged caution: avoid heavy sustained writes on recently patched systems until vendors or Microsoft publish clear remediation guidance. Several outlets reproduced the fault on multiple drives and stress that reproducibility varies by firmware and platform, underscoring that a single global fix may not be as simple as flipping one vendor switch.

Technical analysis: plausible root causes and evidence

Three plausible mechanisms recur in community analyses and vendor statements:

Host Memory Buffer (HMB) allocation changes and DRAM‑less SSD interactions
Many low‑cost NVMe drives are DRAM‑less and rely on HMB, which lets them use a slice of system RAM for caching flash translation layer structures. If the host increases HMB allocation or changes timing, DRAM‑less controllers can enter unstable edge cases. Earlier 24H2 incidents linked increased HMB allocation to BSODs on some WD/SanDisk models; community reproductions connect heavy sequential writes to an HMB stress pattern that could expose firmware weaknesses.
Buffered I/O / NVMe command ordering or timing regression inside the OS
A subtle change in kernel buffering, NVMe driver command ordering, or timing could create a sequence of operations that certain controller firmware simply does not handle well under sustained stress. When firmware fails to respond to admin commands the device can effectively become invisible on the PCIe/OS topology — matching the observed symptom set.
Controller firmware edge‑cases under high utilization
SSD controllers manage garbage collection, translation tables, and wear‑leveling. Under prolonged sequential writes an internal operation (for example a metadata update or internal recovery) might deadlock or corrupt controller state in some firmware revisions — causing the controller to stop responding and making the drive unreachable until a power cycle or firmware reset. Community reports of unreadable SMART data after failures are consistent with firmware lockup scenarios.

Important caveat: conclusive root cause attribution requires joint telemetry from affected SSD vendors and Microsoft. Community reproductions and vendor tools form a strong hypothesis but are not a substitute for vendor/Microsoft diagnostic telemetry.

Real‑world risk assessment: data integrity and severity

Data loss is a real, documented risk in a minority of cases. Some testers recorded instances where files written during the incident were corrupted or missing. That elevates the issue beyond mere inconvenience to potential data‑loss severity, particularly for content creators, system imaging and cloning operations, and large game updates.
The occurrence is not universal across identical model numbers; firmware revision, motherboard firmware (UEFI), and host driver versions materially change whether the fault manifests. This variability complicates blanket remediation and enhances the importance of vendor telemetry.
For enterprise fleets, the presence of a second, unrelated enterprise deployment error (0x80240069) in managed environments means administrators must treat the August rollup cautiously: inventory drives, withhold the update on suspect hardware until validated, and apply Known Issue Rollback where Microsoft recommends it.

Practical mitigations and step‑by‑step actions for users and IT teams

Immediate precautions for end users (consumer and prosumer)

Back up critical data now to a physically separate drive or cloud location. The risk of file corruption during the failure window means backups are the highest priority.
Avoid heavy, sustained writes on systems that have installed KB5063878 until your drive vendor and Microsoft confirm an applicable fix. Large game installs, cloning, archive extraction, and batch media export are typical triggers.
Check your SSD vendor utility (Corsair iCUE, Western Digital Dashboard, SanDisk Toolkit, Kioxia Toolbox, etc.) for firmware advisories and update tools. Apply vendor‑recommended firmware after backing up data.

A pragmatic workaround for advanced users: HMB disable registry tweak (temporary, with performance impact)

Some affected users and vendors previously recommended disabling HMB as a temporary mitigation for related HMB‑allocation BSODs. This is a stopgap and can reduce SSD performance. If used, follow these steps carefully and only if comfortable editing the Registry:

Run Registry Editor (regedit).
Navigate to: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\stornvme\Parameters\Device
Create or modify the DWORD HMBAllocationPolicy and set its value to 0.
Reboot.

Caution: Registry edits can cause system instability if misapplied. This is a temporary mitigation pending vendor/Microsoft fixes and does not replace firmware updates.

Recommendations for IT administrators

Inventory storage devices and flag systems with models and controllers that community testing indicates are at higher risk.
Use update management tools (WSUS, SCCM, Intune) to withhold KB5063878 from fleets with at‑risk hardware pending vendor validation or Microsoft guidance.
If a drive fails in a managed device, image the drive (power off to preserve state), gather logs, and open a support case with the vendor including diagnostic files and SMART dumps — telemetry will be necessary to root‑cause.

Why this keeps happening: ecosystem complexity and testing gaps

Modern operating systems and SSD controllers are tightly coupled systems. Features designed to improve performance — for example, HMB to give DRAM‑less drives an edge without dedicated DRAM — create dependency on host memory allocation and driver timing. When a platform update changes allocation size, timing, or command ordering, it can nudge controller firmware into previously unobserved edge cases.
The Windows Update ecosystem aims to test widely, but full coverage across every controller firmware revision, OEM BIOS version and DRAM‑status combination is practically impossible before shipping a mass rollout. The result is that low‑probability but high‑impact regressions still surface in the field, and remediation requires coordinated telemetry sharing and firmware updates across multiple vendors.

Strengths and weaknesses of the response so far

Notable strengths

Vendors and community testers reproduced a consistent failure profile quickly, enabling targeted advisories and firmware rollouts in prior related incidents. The willingness of vendors like Western Digital to issue firmware fixes for related BSOD problems demonstrates that coordinated vendor remediation can be effective.
Phison’s public acknowledgement and pledge to work with partners is an important step: coordinated remediation depends on transparent engagement and partner advisories. Rapid vendor engagement reduces uncertainty and enables targeted firmware pushes.

Potential risks and weaknesses

Public communications are still partial: Microsoft’s KB page initially lacked a storage‑device known issue entry for KB5063878, and Phison’s statement did not identify specific controller revisions. That leaves users with uncertainty about whether their specific SKU is affected and delays the application of vendor fixes.
The variability of reproductions across firmware revisions and motherboards complicates blanket user guidance and increases the likelihood of missed edge cases where data loss could occur before a fix is applied.

Checklist: What affected users should do now (quick reference)

Back up important data immediately.
Avoid heavy sequential writes (game installs, large clones, archive extraction) on systems that applied KB5063878.
Check SSD vendor utilities for firmware updates and apply vendor‑recommended updates after backing up.
Advanced users: consider HMB disable registry change as a temporary mitigation with performance tradeoffs. Proceed with caution.
IT admins: inventory affected hardware, withhold KB5063878 via update management if needed, and follow Microsoft servicing controls such as Known Issue Rollback where applicable.

Conclusion

The August cumulative KB5063878 for Windows 11 24H2 has exposed a disruptive storage regression in the field that manifests under sustained sequential writes. Phison’s confirmation that it is investigating the “industry‑wide effects” marks the start of an expected vendor‑platform collaboration, but the situation underscores broader systemic fragility: small host changes can reveal latent controller firmware edge cases with real data‑loss consequences.
The responsible route for users is immediate caution: back up data, avoid heavy writes on systems that received the update, and prioritize firmware updates from SSD vendors once they are published. For organizations, conservative update staging and rapid inventorying of potentially affected hardware are essential. The incident also serves as a reminder that modern storage is a co‑engineered system: operating system updates, driver behavior, controller firmware and OEM BIOS versions all matter. Coordinated telemetry, clear vendor advisories, and prompt firmware remediation are the pathway to restoring confidence — and those actions are already underway. (tomshardware.com, guru3d.com)

Source: TechPowerUp Phison Responds to Windows 11 24H2 Update Crashing SSDs

Search

Navigation section

Windows 11 KB5063878 storage regression: NVMe disks vanish; Phison investigates

Background / Overview

Timeline: How the reports unfolded

What’s the technical fingerprint? Symptoms, trigger profile, and reproducibility

Symptoms observed by testers

Typical trigger profile

Which drives are more likely to show the fault?

The Phison response — what was said and what it means

How Microsoft and other vendors have acted so far

Technical analysis: plausible root causes and evidence

Real‑world risk assessment: data integrity and severity

Practical mitigations and step‑by‑step actions for users and IT teams

Immediate precautions for end users (consumer and prosumer)

A pragmatic workaround for advanced users: HMB disable registry tweak (temporary, with performance impact)

Recommendations for IT administrators

Why this keeps happening: ecosystem complexity and testing gaps

Strengths and weaknesses of the response so far

Notable strengths

Potential risks and weaknesses

Checklist: What affected users should do now (quick reference)

Conclusion

Similar threads

Navigation section

Windows 11 KB5063878 storage regression: NVMe disks vanish; Phison investigates

Timeline: How the reports unfolded​

What’s the technical fingerprint? Symptoms, trigger profile, and reproducibility​

Symptoms observed by testers​

Typical trigger profile​

Which drives are more likely to show the fault?​

The Phison response — what was said and what it means​

How Microsoft and other vendors have acted so far​

Technical analysis: plausible root causes and evidence​

Real‑world risk assessment: data integrity and severity​

Practical mitigations and step‑by‑step actions for users and IT teams​

Immediate precautions for end users (consumer and prosumer)​

A pragmatic workaround for advanced users: HMB disable registry tweak (temporary, with performance impact)​

Recommendations for IT administrators​

Why this keeps happening: ecosystem complexity and testing gaps​

Strengths and weaknesses of the response so far​

Notable strengths​

Potential risks and weaknesses​

Checklist: What affected users should do now (quick reference)​

Conclusion​

Similar threads

Timeline: How the reports unfolded

What’s the technical fingerprint? Symptoms, trigger profile, and reproducibility

Symptoms observed by testers

Typical trigger profile

Which drives are more likely to show the fault?

The Phison response — what was said and what it means

How Microsoft and other vendors have acted so far

Technical analysis: plausible root causes and evidence

Real‑world risk assessment: data integrity and severity

Practical mitigations and step‑by‑step actions for users and IT teams

Immediate precautions for end users (consumer and prosumer)

A pragmatic workaround for advanced users: HMB disable registry tweak (temporary, with performance impact)

Recommendations for IT administrators

Why this keeps happening: ecosystem complexity and testing gaps

Strengths and weaknesses of the response so far

Notable strengths

Potential risks and weaknesses

Checklist: What affected users should do now (quick reference)

Conclusion