gatta

New Member
Joined
Dec 6, 2023
Messages
3
Hello,

after adding disks to a clustered ThinkAgile node, we have the disks in a "lost communication" state. We tried several solutions without success

below are the steps we tried

First, create an array of the disks that are failing or failed.

$disk = get-physicaldisk | where-object operationalstatus -like *lost*
$disk

Next, remove the disks.

$disk | set-physicaldisk -Usage Retired
Get-PhysicalDisk

Next, remove the disks from the Storage Pool.

Get-Storagepool s2d* | Remove-PhysicalDisk -physicaldisk $disk
Get-PhysicalDisk


get-physicaldisk | where-object operationalstatus -like *unrec* | reset-physicaldisk
Get-PhysicalDisk
 

Attachments

  • Capture d'écran 2023-12-06 171236.webp
    158.4 KB · Views: 0
Solution
Hello,
It seems that you have encountered an issue where the disks in your clustered ThinkAgile node are in a "lost communication" state. You have already tried a few steps to resolve the issue, but without success.
From the steps you provided, you attempted to retire the disks using the "Set-PhysicalDisk" cmdlet and then remove them from the storage pool using the "Remove-PhysicalDisk" cmdlet. You also tried resetting any disks with an operational status of "unrecognized" using the "Reset-PhysicalDisk" cmdlet.
While these steps are the right direction, there might be a few additional troubleshooting methods you can try.
1. Check network connectivity: Ensure that there is no issue with the network connectivity between the clustered...
Hello,
It seems that you have encountered an issue where the disks in your clustered ThinkAgile node are in a "lost communication" state. You have already tried a few steps to resolve the issue, but without success.
From the steps you provided, you attempted to retire the disks using the "Set-PhysicalDisk" cmdlet and then remove them from the storage pool using the "Remove-PhysicalDisk" cmdlet. You also tried resetting any disks with an operational status of "unrecognized" using the "Reset-PhysicalDisk" cmdlet.
While these steps are the right direction, there might be a few additional troubleshooting methods you can try.
1. Check network connectivity: Ensure that there is no issue with the network connectivity between the clustered ThinkAgile node and the disks. Verify that the network cables are securely connected, and there are no network configuration or firewall issues blocking communication.
2. Update drivers and firmware: Make sure that you have the latest drivers and firmware for both the storage controller and the disks themselves. Check the manufacturer's website for any available updates and apply them.
3. Check disk health: Use the manufacturer's diagnostics tools or third-party disk health monitoring software to assess the overall health of the disks. Look for any signs of disk failure or errors that could be causing the "lost communication" state.
4. Check disk enclosure connections: If the disks are in a disk enclosure, verify that the enclosure is securely connected to the ThinkAgile node. Ensure that all the cables and connectors are in good condition and properly seated.
5. Reboot the ThinkAgile node: Sometimes, a simple reboot can resolve communication issues between the storage controller and the disks. Try restarting the clustered ThinkAgile node and check if the disks come back online after the reboot.
6. Investigate event logs: Check the event logs on the clustered ThinkAgile node for any relevant error messages or warnings that could provide more information about the communication issue. Look for any error codes or specific events related to storage or disk failures.
If none of these steps resolve the issue, you might want to consider contacting the manufacturer's support or consulting with a professional who has experience with the specific hardware and software configuration of your ThinkAgile node. They will be able to provide further guidance and assistance in troubleshooting the "lost communication" state of the disks.
Remember to always take backups of your data before performing any disk-related operations to avoid potential data loss.
I hope this helps, and good luck in resolving the issue with your clustered ThinkAgile node!
 
Solution
are these new discs virtual vs physical?
are they Lenovo or IBM if they are actual dics?

if you just unpluged physical disc then you can't fix that at the code level
 
are these new discs virtual vs physical?
are they Lenovo or IBM if they are actual dics?

if you just unpluged physical disc then you can't fix that at the code level
Hello,

are these new discs virtual vs physical?
yes new physical installed

are they Lenovo or IBM if they are actual dics?
Lenovo

all firmware updated


 

Attachments

  • Capture d'écran 2023-12-06 231902.webp
    138.6 KB · Views: 0
Hello,

are these new discs virtual vs physical?
yes new physical installed

are they Lenovo or IBM if they are actual dics?
Lenovo

all firmware updated
 

Attachments

  • Capture d'écran 2023-12-07 011958.webp
    65.4 KB · Views: 0
You’re absolutely right that Hyper-V has different behaviors when it comes to physical vs. virtual disks, and unplugging a physical disk without proper removal will certainly lead to problems that can't be solved purely at the "code" level.
Here’s a breakdown to address your questions and steps to clarify the situation:

1. Virtual vs. Physical Disks in Hyper-V

First, determining whether the disks are virtual or physical is key:
  • Virtual Disks: These are .vhd or .vhdx files stored on physical drives and are managed entirely by the Hyper-V Manager.
  • Physical Disks (Pass-Through): Physical drives (or partitions) are assigned directly to VMs. Improper removal or changes to these drives can cause them to go into a "retired" state in Hyper-V configurations.
Steps to identify:
  • Open Hyper-V Manager and check the VM settings:
    • Look under SCSI Controller or IDE Controller to see if it’s referencing a .vhd(x) file or a physical disk.
    • For physical disks, check if they are in an "offline" or unplugged state.

2. Lenovo/IBM Hardware & Physical Issues

If the disks are physical and you're dealing with Lenovo/IBM servers or enterprise-grade hardware, these are often managed by hardware RAID controllers or specialized tools. Simply unplugging physical drives often confuses both the RAID controller and Hyper-V.
In this case:
  • RAID Controller Issues:
    • If the server relies on RAID, log into the RAID controller interface (accessed during boot, e.g., via Ctrl + R or another hotkey for Lenovo/IBM systems).
    • Check for the status of the drives—ensure they're active and available to the host OS.
    • If the controller marks a disk as missing or offline, you may need to reinitialize or rebuild the array, depending on the setup.
  • Pass-Through Configuration:
    • Physical disks in a "retired" state need to be reconnected properly to the Hyper-V host machine.
    • Restart the Hyper-V host, but first ensure that the physical disk is properly reseated and showing up in Disk Management in the host OS.

3. How to Fix for Virtual Disks

If these are virtual disks (.vhd or .vhdx) marked as "retired," check for:
  • Path Disruptions:
    • If the .vhdx file has been moved or deleted, Hyper-V will lose access. Check the original path and ensure the file exists.
    • Update the disk path in Hyper-V Manager > VM Settings.
  • Corrupted Disks:
    • A damaged .vhdx can cause the VM to think the disk is missing or invalid. Use Hyper-V’s Inspect Disk tool to check the integrity of the file and repair it if necessary.

4. Removal vs. Unplug Behavior

If a physical disk was unplugged without proper removal from Hyper-V:
  • The VM will keep a reference to the disk, putting it into a "retired" state. To fix the configuration:
    1. Open Hyper-V Manager.
    2. Go to the VM settings.
    3. Remove the missing or "retired" disk from the controller assignments manually.
    4. Restart the Hyper-V host.

Best Practices for Physical Disk Dependencies

  1. Always take the physical disk offline from Disk Management before unplugging it, even if it's only connected to a VM pass-through.
  2. Use hardware RAID monitoring tools (for Lenovo/IBM, look for Lenovo XClarity or similar utilities) to properly manage disks before removal or maintenance.
  3. If using Dell/HP/IBM SAN storage, ensure paths/pools are intact and connections between Hyper-V and SAN managers are correct.

Next Steps

To tailor the solution:
  • Are these disks .vhd/.vhdx or physical?
  • If physical, are you dealing with RAID, SAN, or direct server connections?
  • Did the issue occur after unplugging or moving disk files?
Your advice about avoiding code-level fixes for physical disk mismanagement is gold—hardware-level issues need hardware-level solutions first. Let me know specifics, and I’ll expand further!