Hi
@Nieltobi, Sorry to hear about the trouble you're having with your Intel server. Given the critical nature of your issue, let’s tackle this systematically. Here are several areas we need to scrutinize:
1. Blue Screen Error Codes
The specific Blue Screen of Death (BSOD) error codes you mentioned (
DRIVER_IRQL_NOT_LESS_OR_EQUAL
and
PAGE_FAULT_IN_NONPAGED_AREA
) typically point to driver issues, memory corruption, or hardware problems. Here’s a more detailed breakdown:
- DRIVER_IRQL_NOT_LESS_OR_EQUAL (
0xD1
):
- Cause: Generally caused by a bad driver or faulty software causing a conflict or corruption in memory. [*PAGE_FAULT_IN_NONPAGED_AREA (
0x50
):
- Cause: Usually points to hardware problems, including RAM issues, or sometimes faulty drivers and software.
2. Steps to Diagnose and Resolve
Here are detailed steps to further diagnose and resolve these issues:
a. Check and Update Firmware
- BIOS/UEFI Firmware: [*Ensure your BIOS/UEFI is up-to-date. Manufacturers often release updates that address hardware compatibility and stability.
b. Driver Checks
- Verify Driver Versions: [Double-check that all critical system drivers (e.g., network, storage, chipset) are the latest versions provided by the hardware manufacturer. [Specifically, look for updates on the Intel website for your server model.
- Roll Back or Reinstall Drivers: [Sometimes newer drivers can cause issues. Consider rolling back to a previous stable version if the crashes started after a recent update. [Reinstall the drivers from scratch to ensure they are correctly installed.
c. Hardware Diagnostics
- Memory Tests: [Although you’ve already run some memory diagnostics, use tools like https://www.memtest86.com/ for a detailed scan. [Consider testing with individual memory modules to isolate any defective ones.
- Storage Diagnostics: [*Run comprehensive tests on your SSDs using tools provided by the SSD manufacturer or third-party tools like CrystalDiskInfo or HD Tune.
d. Analyze Minidump Files
e. Event Viewer Analysis
[*You mentioned event logs indicate driver conflicts or hardware failures. Look at the Windows Event Viewer logs (System and Application logs) around the crash times:
- System Logs: Check for any critical errors or warnings.
- Application Logs: Verify any application-specific errors.
3. Recommendations and Preventative Actions
a. Thermal Management
[*Given that crashes often happen during high workload periods, thermal issues could be a factor. Ensure sufficient cooling and check the server's thermal paste and fans.
b. Power Supply
[*Ensure that your power supply is adequate for your hardware setup. Power issues can also lead to unexpected system crashes.
c. Disable Unnecessary Hardware
[*In some cases, disabling non-essential hardware components in Device Manager can help isolate the problem.
4. Advanced Options
a. System Restore or Reimage
[*If the problem persists, try using a system restore point to revert to a known-good configuration or consider reimaging the server.
b. Windows Debugging
[*Enable Driver Verifier to identify problematic drivers, but be cautious as it can cause further BSODs:
- Run
verifier
in CMD and enable standard settings. After identifying and fixing the culprit driver, disable it using verifier /reset
.
c. Hardware Replacement
[*If specific hardware components are suspected, replacing them might be necessary.
Conclusion
Please try these steps and let us know the results. Gathering more information from minidumps and event logs will be critical in pinpointing the root cause. If you need further guidance on any specific step, feel free to ask! Best of luck, and hopefully, we can get your server back to stable operation quickly.