Urgent Help Needed with Intel Server on Windows

Nieltobi

New Member
Joined
Jan 25, 2024
Messages
7
Hello everyone,

I'm in urgent need of assistance with an issue that's seriously impacting my work. We're running a critical application on an Intel server (2U rack server with Xeon CPUs) with Windows Server 2019, and we've been encountering some unexpected problems.

The server has been experiencing frequent crashes and blue screen errors, particularly during high workload periods. These crashes are causing significant downtime and disrupting our operations. Here are some key details:

  • Hardware: Intel 2U rack server, dual Xeon Gold CPUs, 256GB RAM, SSD storage.
  • OS: Windows Server 2019 Standard Edition.
  • Issue: Frequent crashes and blue screen errors (e.g., DRIVER_IRQL_NOT_LESS_OR_EQUAL, PAGE_FAULT_IN_NONPAGED_AREA).
  • Steps Taken: I've updated all drivers, ensured Windows is fully patched, and ran memory diagnostics with no errors found. Despite these efforts, the issue persists.
I've also checked the event logs, which indicate potential driver conflicts or hardware failures, but I haven't been able to pinpoint the exact cause. Is anyone else experiencing similar issues with Intel servers on Windows? Could this be related to a specific driver or hardware compatibility problem? Any advice or recommendations would be greatly appreciated, as this is starting to cause delays in our project timelines.

Thank you in advance for your help.
 


Hi @Nieltobi,

Sorry to hear about the trouble you're having with your Intel server. Given the critical nature of your issue, let’s tackle this systematically. Here are several areas we need to scrutinize:

### 1. Blue Screen Error Codes
The specific Blue Screen of Death (BSOD) error codes you mentioned (DRIVER_IRQL_NOT_LESS_OR_EQUAL and PAGE_FAULT_IN_NONPAGED_AREA) typically point to driver issues, memory corruption, or hardware problems. Here’s a more detailed breakdown:

- DRIVER_IRQL_NOT_LESS_OR_EQUAL (0xD1):
- Cause: Generally caused by a bad driver or faulty software causing a conflict or corruption in memory.

- PAGE_FAULT_IN_NONPAGED_AREA (0x50):
- Cause: Usually points to hardware problems, including RAM issues, or sometimes faulty drivers and software.

### 2. Steps to Diagnose and Resolve

Here are detailed steps to further diagnose and resolve these issues:

#### a. Check and Update Firmware
1. BIOS/UEFI Firmware:
- Ensure your BIOS/UEFI is up-to-date. Manufacturers often release updates that address hardware compatibility and stability.

#### b. Driver Checks
1. Verify Driver Versions:
- Double-check that all critical system drivers (e.g., network, storage, chipset) are the latest versions provided by the hardware manufacturer.
- Specifically, look for updates on the Intel website for your server model.

2. Roll Back or Reinstall Drivers:
- Sometimes newer drivers can cause issues. Consider rolling back to a previous stable version if the crashes started after a recent update.
- Reinstall the drivers from scratch to ensure they are correctly installed.

#### c. Hardware Diagnostics
1. Memory Tests:
- Although you’ve already run some memory diagnostics, use tools like MemTest86 for a detailed scan.
- Consider testing with individual memory modules to isolate any defective ones.

2. Storage Diagnostics:
- Run comprehensive tests on your SSDs using tools provided by the SSD manufacturer or third-party tools like CrystalDiskInfo or HD Tune.

#### d. Analyze Minidump Files
  • The system should create minidump files when it crashes. These files can provide specific insights into the crash cause.
  • Use tools like BlueScreenView or WinDbg to examine these dumps.

#### e. Event Viewer Analysis
- You mentioned event logs indicate driver conflicts or hardware failures. Look at the Windows Event Viewer logs (System and Application logs) around the crash times:
- System Logs: Check for any critical errors or warnings.
- Application Logs: Verify any application-specific errors.

### 3. Recommendations and Preventative Actions

#### a. Thermal Management
- Given that crashes often happen during high workload periods, thermal issues could be a factor. Ensure sufficient cooling and check the server's thermal paste and fans.

#### b. Power Supply
- Ensure that your power supply is adequate for your hardware setup. Power issues can also lead to unexpected system crashes.

#### c. Disable Unnecessary Hardware
- In some cases, disabling non-essential hardware components in Device Manager can help isolate the problem.

### 4. Advanced Options

#### a. System Restore or Reimage
- If the problem persists, try using a system restore point to revert to a known-good configuration or consider reimaging the server.

#### b. Windows Debugging
- Enable Driver Verifier to identify problematic drivers, but be cautious as it can cause further BSODs:
- Run verifier in CMD and enable standard settings. After identifying and fixing the culprit driver, disable it using verifier /reset.

#### c. Hardware Replacement
- If specific hardware components are suspected, replacing them might be necessary.

### Conclusion
Please try these steps and let us know the results. Gathering more information from minidumps and event logs will be critical in pinpointing the root cause. If you need further guidance on any specific step, feel free to ask!

Best of luck, and hopefully, we can get your server back to stable operation quickly.
 


Hi i need help i have 2u rack intel server too i have intel xeon 8 tb ram 3000w power supply windows 8 pro and windows nt 4.0 and i have a problem when, i insert hdd (server or normal hdd) the server detects it but it wont spin up and bluescreen with 0xc000007b pops up error i need to run this server because it´s for network management please urgent help

 


Back
Top