Phison has publicly acknowledged and replicated a key finding first raised by the PCDIY community: a wave of disappearing and allegedly “bricked” NVMe SSDs linked in timing to Windows 11’s August cumulative update (KB5063878) appears to have been driven, in at least some test cases, by pre‑release engineering firmware installed on development or non‑retail units — not by the retail firmware shipping on consumer drives. This admission shifts the narrative from a platform‑wide Windows regression to a narrower supply‑chain and firmware‑provenance problem, but it leaves several important questions about disclosure, remediation, and data‑loss risk unanswered. (neowin.net)
In mid‑August 2025 Microsoft shipped the Windows 11 24H2 August cumulative update commonly tracked as KB5063878 (OS Build 26100.4946). Within days, hobbyist labs and several specialist outlets documented a reproducible failure profile: during sustained large sequential writes (often in the neighborhood of ~50 GB or more) some NVMe SSDs would temporarily disappear from Windows, stop responding to vendor tools, and in some cases return with corrupted or RAW partitions. This pattern was repeatedly observed on drives that were partially filled and under heavy write stress. (techradar.com)
Phison — a major NAND controller vendor whose silicon appears in many consumer NVMe products — initially investigated the reports and later published a substantial validation report saying it had run more than 4,500 cumulative testing hours and 2,200+ test cycles across the reported device set and could not reproduce a systemic failure on production firmware. Microsoft also reported no telemetry‑based link between KB5063878 and a spike in disk failures across its fleet. Those two public positions initially framed the incident as either rare hardware coincidence or a narrowly scoped configuration problem. (guru3d.com, pcgamer.com)
Despite those vendor statements, community test benches continued to publish reproducible recipes. A DIY PC group (PCDIY) noted that drives used in their stress tests were running engineering preview firmware — a class of pre‑release firmware builds intended for validation and not meant for retail use — and that only those engineering builds failed under the Windows workload, while units on confirmed production images did not. Phison followed up, stating it had examined the exact SSDs used in PCDIY’s testing, confirmed the presence of pre‑release engineering firmware on those units, and replicated the stress tests on consumer‑available drives without reproducing the failures. Phison also said it could reproduce the failure when using the non‑retail engineering firmware.
If engineering firmware inadvertently reaches retail units — through mis‑flashed production lines, preconfigured evaluation units, or supply‑chain crossover — a host change (like a Windows update that subtly alters I/O timing or buffering) can expose those latent bugs. That explains why vendor lab programs that test production images at scale might not see failures while hobbyist benches using a mixed pool of hardware could reproduce them consistently.
However, several important items remain unverified in the public record:
That said, the absence of a public serial‑range advisory or a detailed forensic packet means the episode is not yet fully closed from an auditable evidence perspective. Until vendors publish precise, verifiable rollout and remediation artifacts, users and administrators should adopt a pragmatic, defensive posture: back up critical data, avoid heavy sequential writes on patched systems until firmware is verified, check vendor support pages for official firmware updates, and preserve any suspect drive for vendor diagnostics.
The incident is a useful, if unwelcome, case study in cross‑stack risk: when OS updates, controller firmware variants, and supply‑chain processes collide, the result can be a high‑impact edge case that only careful forensics will fully explain. In the meantime, data protection and measured staging of patches remain the best defenses. (neowin.net, techradar.com)
Source: Windows Report Phison Confirms PCDIY Report on Engineering Firmware Causing SSD Failures Tied to KB5063878 Update
Background / Overview
In mid‑August 2025 Microsoft shipped the Windows 11 24H2 August cumulative update commonly tracked as KB5063878 (OS Build 26100.4946). Within days, hobbyist labs and several specialist outlets documented a reproducible failure profile: during sustained large sequential writes (often in the neighborhood of ~50 GB or more) some NVMe SSDs would temporarily disappear from Windows, stop responding to vendor tools, and in some cases return with corrupted or RAW partitions. This pattern was repeatedly observed on drives that were partially filled and under heavy write stress. (techradar.com)Phison — a major NAND controller vendor whose silicon appears in many consumer NVMe products — initially investigated the reports and later published a substantial validation report saying it had run more than 4,500 cumulative testing hours and 2,200+ test cycles across the reported device set and could not reproduce a systemic failure on production firmware. Microsoft also reported no telemetry‑based link between KB5063878 and a spike in disk failures across its fleet. Those two public positions initially framed the incident as either rare hardware coincidence or a narrowly scoped configuration problem. (guru3d.com, pcgamer.com)
Despite those vendor statements, community test benches continued to publish reproducible recipes. A DIY PC group (PCDIY) noted that drives used in their stress tests were running engineering preview firmware — a class of pre‑release firmware builds intended for validation and not meant for retail use — and that only those engineering builds failed under the Windows workload, while units on confirmed production images did not. Phison followed up, stating it had examined the exact SSDs used in PCDIY’s testing, confirmed the presence of pre‑release engineering firmware on those units, and replicated the stress tests on consumer‑available drives without reproducing the failures. Phison also said it could reproduce the failure when using the non‑retail engineering firmware.
What exactly happened: the technical fingerprint
Symptoms observed in the wild and in labs
- Drives vanish from File Explorer, Device Manager, and Disk Management while a large write is in progress.
- Vendor diagnostic tools and SMART readers sometimes fail to query the device after the event.
- Reboots occasionally restore device visibility; in other cases the drive remains unreadable or returns corrupted partitions.
- The reproducible trigger that community labs used was a sustained sequential write, typically tens of gigabytes in one continuous operation, on drives already partially used (>50–60% capacity).
Why firmware provenance matters
SSD firmware controls critical behaviors: command handling, mapping tables, garbage collection, thermal throttling, and interactions with Host Memory Buffer (HMB) where applicable. Engineering or pre‑release firmware images commonly include diagnostic hooks, un‑hardened code paths, and performance instrumentation — exactly the sort of differences that can reveal latent bugs under unusual host timing or workload patterns.If engineering firmware inadvertently reaches retail units — through mis‑flashed production lines, preconfigured evaluation units, or supply‑chain crossover — a host change (like a Windows update that subtly alters I/O timing or buffering) can expose those latent bugs. That explains why vendor lab programs that test production images at scale might not see failures while hobbyist benches using a mixed pool of hardware could reproduce them consistently.
Phison’s investigation and lab findings — what’s verified
- Phison publicly described a large validation program (4,500+ hours and 2,200+ cycles) and said it could not reproduce the reported disappearance/crash pattern on production firmware images. Multiple independent outlets reported this figure and Phison’s inability to reproduce at scale. (guru3d.com, pcgamer.com)
- After being contacted by PCDIY, Phison said it examined the exact drives used in those tests, found those units were running engineering preview firmware, and replicated the community stress tests on retail consumer drives without failures. Phison also reproduced failures on those same models when they were flashed with the engineering firmware image. That strongly suggests a firmware‑image provenance issue, not a universal Windows regression.
- Phison additionally recommended standard best practices — thermal mitigation for high‑performance drives and coordination with OEM partners — while continuing to monitor partner telemetry. (neowin.net, guru3d.com)
Independent corroboration and outstanding gaps
Multiple respected outlets independently reported Phison’s testing numbers and public statements, and specialist communities published the same stress recipes that produced failures in those benches. That gives independent credibility to both the community reproductions and Phison’s public test program. (techradar.com, pcgamer.com, guru3d.com)However, several important items remain unverified in the public record:
- No SSD brand has published a public RMA or serial‑range advisory saying that specific retail units inadvertently shipped with engineering firmware. That would be the clearest, auditable evidence tying affected units to a supply‑chain misflash.
- Phison’s public releases emphasize its inability to reproduce faults in production images; the company’s private validation of the PCDIY claim appears in secondary reporting rather than as a transparent, downloadable forensic report.
- Corsair (maker of the Force MP600 referenced in multiple community lists) had not issued a formal public statement confirming or denying that any shipped MP600 units ran engineering images; Phison’s comments referenced the E16 controller and specific reproduced failures tied to engineering firmware, but a vendor level advisory specifying serial ranges and remediation steps would be the final confirmation collectors and administrators need. (tomshardware.com)
What this means for users and administrators
Immediate practical guidance (prioritize these steps)
- Back up critical data now. Copy important files to external drives or reliable cloud storage before attempting large write operations. Data preservation is the priority because the worst outcome is unrecoverable loss.
- Avoid sustained large sequential writes (game installs, large archive extraction, cloning, video exports) on systems that recently installed KB5063878 until you confirm your SSD’s firmware level. Community replicable tests show the failure mode occurs under continuous heavy writes.
- Check your SSD vendor’s support pages and tools for firmware advisories and only install vendor‑approved firmware via official utilities. Do not flash unofficial images. (guru3d.com)
- If a drive disappears mid‑write, preserve it for diagnostics. Do not immediately reformat. Capture logs (Event Viewer, NVMe traces) and contact vendor support; they may request serial numbers and device images for forensic analysis.
- For administrators: stage KB5063878 in a test ring that mirrors your storage fleet. Run representative, high‑write workloads and validate firmware levels before broad deployment. Treat vendor firmware updates as the primary remediation path.
Why this is conservative but necessary
Even if the root cause is restricted to engineering firmware in a small subset of units, the impact per incident is high. The ability to reproduce the failure reliably in community labs means the bug is real in that narrow context — and when disk disappearance happens during a large write, data loss is possible. That justifies conservative mitigation while vendor forensics continue.Assessing vendor responses: strengths, weaknesses, and risks
Strengths
- Phison’s large, formal validation program (thousands of hours and cycles) demonstrates seriousness and capacity to run rigorous testing at scale. That lends weight to its claim that the issue is not a mass‑market production failure. (guru3d.com, pcgamer.com)
- Transparent community reproductions helped escalate a narrow technical issue into a coordinated vendor investigation quickly; hobbyist benches often exercise real‑world workloads that automated vendor suites might not stress repeatedly. That civic technical scrutiny is a strength of the PC ecosystem.
- Microsoft’s telemetry‑backed assertion that there is no detectable spike in field drive failures after KB5063878 provides an important statistical check on alarmist claims. Telemetry at Microsoft scale is a meaningful dataset. (techradar.com)
Weaknesses and risks
- Lack of public forensic disclosure. Phison’s public statements confirm testing and say it replicated the PCDIY behavior on engineering firmware, but no vendor has published a full forensic trace (ETW/command captures, firmware logs) that independent researchers can analyze. That opacity undermines confidence for some customers and media.
- No serial‑range advisory yet. If engineering images did reach retail channels, vendors have not publicly listed which batches are affected. Without that, buyers and IT fleets cannot easily determine exposure.
- Potential supply‑chain accountability gap. If mis‑flashed or test firmware escaped into shipping channels, that points to process control issues at manufacturing or distribution that carry operational and reputational risk for SSD brands and controller vendors.
- Messaging friction. The initial messaging (Phison: “unable to reproduce at scale”) and later lab confirmation tied to engineering firmware — relayed in secondary reporting — created confusion and distrust. Vendors need clearer, direct statements when possible to prevent rumor escalation. (tomshardware.com)
The supply‑chain angle: how engineering firmware can leak into retail devices
Engineering firmware is often present on development samples, evaluation boards, and early factory line test units. Possible leakage paths include:- Factory test units being used in live builds without a final firmware flash step.
- Service or evaluation packs sent to system builders that retain engineering images.
- Mixups in production lines where firmware rollforward/rollback procedures fail.
What vendors should (and likely will) do next
- Issue a clear, public advisory if any serial ranges or SKUs are confirmed to have engineering firmware installed at shipment. That advisory should include steps for RMA or firmware reflash where possible.
- Publish forensic artifacts (redacted as necessary) that allow independent verification: ETW traces, NVMe command captures, and firmware logs from affected and unaffected units.
- Provide official firmware tools and recovery instructions for affected SKUs; where firmware rollback is impossible, provide RMA or replacement paths.
- Strengthen factory firmware provenance controls and create traceability where consumers or system builders can verify production images via vendor tools.
How to check if your drive is affected (practical checklist)
- Use your SSD vendor’s official tool (Corsair SSD Toolbox, WD Dashboard, SanDisk SSD Dashboard, etc.) to check the installed firmware version.
- Compare the installed firmware against vendor‑published latest production firmware. Do not use third‑party flashing utilities.
- If you’re running heavy write workloads regularly (content production, game installs, cloning), consider temporarily pausing large sequential writes until you confirm a safe firmware version.
- If you suspect your drive has disappeared mid‑write or shows corrupted partitions, stop using it and contact vendor support. Capture Event Viewer logs and keep the device intact for diagnostics. (guru3d.com)
Caveats and unverifiable claims — what to watch for
- The PCDIY claim that Phison engineers verified the engineering‑firmware trigger has been reported in secondary media; Phison’s public statements emphasize inability to reproduce problems on production firmware and replication of failures on engineering firmware when shown the sample units. The precise internal lab evidence (full traces, serial ranges) has not been released publicly for independent review — so treat that sequence as partially corroborated but not exhaustively proved in public.
- Any single anecdote of data loss should be validated via vendor diagnostics before attributing causation to the Windows update or a firmware image. Correlation by timing is not definitive proof of causation without forensic artifacts. Multiple independent outlets have emphasized this caution. (theverge.com, pcgamer.com)
Long‑term implications for the Windows + SSD ecosystem
This incident — regardless of its final forensic resolution — surfaces several durable lessons:- Modern storage reliability depends on a complicated coordination across OS changes, driver behavior, controller firmware, and factory provisioning. Minor changes in one layer can reveal latent bugs in another.
- Open, rapid forensic sharing between vendors and communities matters. Hobbyist labs stress‑test real workloads at a scale vendor test suites may not emulate, and community evidence can be a valuable complement to vendor telemetry.
- Firmware provenance and factory traceability are not optional extras; they are safety features. The industry needs better mechanisms to ensure consumer devices ship with production‑hardened firmware and that any exceptions are traceable and remediable.
- For enterprise and fleet managers, the episode is a reminder to stage updates, validate with representative hardware, and maintain aggressive backup and imaging policies.
Final assessment and recommended posture
Phison’s follow‑up — confirming that the PCDIY test units used engineering preview firmware and that failures could be reproduced on those non‑retail images while consumer‑available production drives did not fail in the same tests — is a credible reconciliation of the otherwise conflicting signals from community benches and vendor telemetry. It logically explains why hobbyist tests could reproducibly crash some drives while Phison’s mass tests did not.That said, the absence of a public serial‑range advisory or a detailed forensic packet means the episode is not yet fully closed from an auditable evidence perspective. Until vendors publish precise, verifiable rollout and remediation artifacts, users and administrators should adopt a pragmatic, defensive posture: back up critical data, avoid heavy sequential writes on patched systems until firmware is verified, check vendor support pages for official firmware updates, and preserve any suspect drive for vendor diagnostics.
The incident is a useful, if unwelcome, case study in cross‑stack risk: when OS updates, controller firmware variants, and supply‑chain processes collide, the result can be a high‑impact edge case that only careful forensics will fully explain. In the meantime, data protection and measured staging of patches remain the best defenses. (neowin.net, techradar.com)
Source: Windows Report Phison Confirms PCDIY Report on Engineering Firmware Causing SSD Failures Tied to KB5063878 Update