- Thread Author
- #1
NOTE: Sorry for the long post, trying to be thorough.
Just for reference, these are the specs of the machine. Everything is running stock, it's a work machine:
OS: Win 10 Pro
CPU: i7-6700K
Motherboard: ASUS Z170-A LGA 1151
RAM: 2x G.SKILL Ripjaws V Series 16GB 288-Pin DDR4
Using onboard GPU
Combination of SSDs (EVOs) and HDDs (WD Blue and Gold)
So this computer starting BSODing yesterday/night before in the middle of the night at 11pm (per event log), after having been untouched since I left at 6pm. It BSOD'd frequently in the morning when I arrived (every 30 seconds to 5 minutes) and worsened to the point where it could barely get to the log on screen.
For most of my morning troubleshooting I assumed it was due to a Win 10 update and a driver conflict. The faulting module in the memory dump was intelppm, and the minidumps showed a 124 error (WHEA_UNCORRECTABLE_ERROR (124)), so I figured the chipset and/or CPU drivers were to blame.
However, this is what I've done so far, so now I'm looking for confirmation if I should try an RMA for the CPU (using onboard gpu, so no dedicated gfx card to test):
However, with the above tests I'm pretty sure I (mostly) ruled out the RAM, ruled out the hard drives, ruled out the OS, ruled out the PSU. Even though it looks completely certain that it's the CPU, I'm just finding that hard to believe. And if it is the CPU, has anyone experienced something like this? The PC has had a rock solid uptime of about 10 months, treated very well in a clean environment, CPU always ran cool, max temp it has reached is 61C.
Anyway, any thoughts would be appreciated. I don't know how to go about and test the CPU for the purpose of an RMA process without an OS, so any clues on that would be great too!
Included are Sysinfo file for specs and one of the mini dumps in case that confirms anything (though to me the 124 error is probably way too generic).
The CPU I suspect is a 6700K, so the current 6700 that's working is a bit of a step down, but no biggie for my normal workload. Also I haven't bothered putting the second stick of RAM back in, so the specs only show 1 stick at 16 GB.
Just for reference, these are the specs of the machine. Everything is running stock, it's a work machine:
OS: Win 10 Pro
CPU: i7-6700K
Motherboard: ASUS Z170-A LGA 1151
RAM: 2x G.SKILL Ripjaws V Series 16GB 288-Pin DDR4
Using onboard GPU
Combination of SSDs (EVOs) and HDDs (WD Blue and Gold)
So this computer starting BSODing yesterday/night before in the middle of the night at 11pm (per event log), after having been untouched since I left at 6pm. It BSOD'd frequently in the morning when I arrived (every 30 seconds to 5 minutes) and worsened to the point where it could barely get to the log on screen.
For most of my morning troubleshooting I assumed it was due to a Win 10 update and a driver conflict. The faulting module in the memory dump was intelppm, and the minidumps showed a 124 error (WHEA_UNCORRECTABLE_ERROR (124)), so I figured the chipset and/or CPU drivers were to blame.
However, this is what I've done so far, so now I'm looking for confirmation if I should try an RMA for the CPU (using onboard gpu, so no dedicated gfx card to test):
- The PC has 2x sticks of RAM, so I tried each individually, no change.
- I disconnected all hard disks except the primary, no change.
- I replaced the main disk with another disk with a Win 10 install on it, no change, same BSOD.
- I put in a fresh SSD and loaded the Win 10 installer from a new Win 10 Pro Install disc. It BSOD's TWICE loading the CD interface (Win 10 not even installed yet). I didn't even know you could BSOD at this point...
- The third try it did load the CD interface and got about halfway through the install of Win 10 until it BSOD'd.
- I replaced the PSU with a known good, no change.
- I managed to find another spare machine with a 6700 (non-K) chip, and put that in my machine. Problem gone. I can now boot back into the normal Win 10 install, as well as the secondary test install disk I put in, completely stable for over 18 hours now. It basically went from 30x or so BSOD's an hour to nothing with this switch.
However, with the above tests I'm pretty sure I (mostly) ruled out the RAM, ruled out the hard drives, ruled out the OS, ruled out the PSU. Even though it looks completely certain that it's the CPU, I'm just finding that hard to believe. And if it is the CPU, has anyone experienced something like this? The PC has had a rock solid uptime of about 10 months, treated very well in a clean environment, CPU always ran cool, max temp it has reached is 61C.
Anyway, any thoughts would be appreciated. I don't know how to go about and test the CPU for the purpose of an RMA process without an OS, so any clues on that would be great too!
Included are Sysinfo file for specs and one of the mini dumps in case that confirms anything (though to me the 124 error is probably way too generic).
The CPU I suspect is a 6700K, so the current 6700 that's working is a bit of a step down, but no biggie for my normal workload. Also I haven't bothered putting the second stick of RAM back in, so the specs only show 1 stick at 16 GB.
- Joined
- Jan 28, 2013
- Messages
- 2,420
Yup! Seen lots of CPU failures over the years. I used to build them from scratch back in my foundry days. I also used to burn them up intentionally using a variety of methods including overclocking, stress-testing such as HeavyLoad etc. When teaching A+ certification courses at my local Junior College (ROP), we would intentionally install dead CPUs or failing ones from a drawerful of them we kept in the hardware labs! Then teach the students how to identify a faulty CPU using software tools and such. Many of them had never seen a dead CPU or a dead Mobo before the class. In the midterm and final labs, I also would bend the pins on the CPUs or even break them off and of course that would produce some very wonky symptoms. Usually the ''A" students would be able to figure it out in a 3 hr. lab test; but not always.![Ahaha :ahaha: :ahaha:](/images/smilies/yahoo/21.gif)
![Tribal :tribal: :tribal:](/images/smilies/yahoo/48.gif)
You can use tools such as SPECCY, SI-SANDRA, and CPU-z ID, and HWMONITOR to look at your CPU. There are also lots of tools you can use available on the Ultimate BootCD available free here: UBCD.com
I haven't done much of that with the more modern i3-i7 CPUs; but the testing process hasn't changed that much I'm sure. Intel also has some of their own CPU testing diagnostics (older ones are found on UBCD disc), so recommend you check their website: Link Removed
My friend who worked at Intel until last year mentioned something about 3 yrs. warranty on their 6th and 7th gen chips. If older than that, you'll probably have to pay something for the out of warranty RMA replacement and 1-way shipping back to Intel.
Best,![Up :up: :up:](/images/smilies/yahoo/113.gif)
<<<<BIGBEARJEDI>>>>
![Ahaha :ahaha: :ahaha:](/images/smilies/yahoo/21.gif)
![Tribal :tribal: :tribal:](/images/smilies/yahoo/48.gif)
You can use tools such as SPECCY, SI-SANDRA, and CPU-z ID, and HWMONITOR to look at your CPU. There are also lots of tools you can use available on the Ultimate BootCD available free here: UBCD.com
I haven't done much of that with the more modern i3-i7 CPUs; but the testing process hasn't changed that much I'm sure. Intel also has some of their own CPU testing diagnostics (older ones are found on UBCD disc), so recommend you check their website: Link Removed
My friend who worked at Intel until last year mentioned something about 3 yrs. warranty on their 6th and 7th gen chips. If older than that, you'll probably have to pay something for the out of warranty RMA replacement and 1-way shipping back to Intel.
Best,
![Up :up: :up:](/images/smilies/yahoo/113.gif)
<<<<BIGBEARJEDI>>>>
- Joined
- Jan 28, 2013
- Messages
- 2,420
- Thread Author
- #4
- Thread Author
- #5
Hey BBJ,
Problem with your suggestion of monitoring the CPU with those programs is that I can't put the CPU in a system to run anything, as it BSODs at logon. I was able to get into the BIOS and look at the basic specs, and the voltages and temp all looked fine. Once I get some more thermal paste I can throw the 6700K back in here and see if I can get it booted long enough to get a snapshot from HWMonitor and see if that tells us anything.
The chip is roughly 10 months old, so assuming it's actually bad that would still be within warranty.
And a bit off topic, in terms of physical damage I've seen problem CPUs as well, overclocking them to death, condensation (pin rot), broken pins, etc.
Just nothing like this, since it never got hot, was never overclocked, and ran stably for 10 months until yesterday when it had problems out of nowhere (seemingly anyway). I can only suspect a manufacturing defect, not damage that happened after the fact.
Problem with your suggestion of monitoring the CPU with those programs is that I can't put the CPU in a system to run anything, as it BSODs at logon. I was able to get into the BIOS and look at the basic specs, and the voltages and temp all looked fine. Once I get some more thermal paste I can throw the 6700K back in here and see if I can get it booted long enough to get a snapshot from HWMonitor and see if that tells us anything.
The chip is roughly 10 months old, so assuming it's actually bad that would still be within warranty.
And a bit off topic, in terms of physical damage I've seen problem CPUs as well, overclocking them to death, condensation (pin rot), broken pins, etc.
Just nothing like this, since it never got hot, was never overclocked, and ran stably for 10 months until yesterday when it had problems out of nowhere (seemingly anyway). I can only suspect a manufacturing defect, not damage that happened after the fact.
- Joined
- Jul 4, 2015
- Messages
- 8,980
- Thread Author
- #7
- Joined
- Jul 4, 2015
- Messages
- 8,980
- Joined
- Aug 28, 2007
- Messages
- 36,399
Hi Francis,NOTE: Sorry for the long post, trying to be thorough.
Just for reference, these are the specs of the machine. Everything is running stock, it's a work machine:
OS: Win 10 Pro
CPU: i7-6700K
Motherboard: ASUS Z170-A LGA 1151
RAM: 2x G.SKILL Ripjaws V Series 16GB 288-Pin DDR4
Using onboard GPU
Combination of SSDs (EVOs) and HDDs (WD Blue and Gold)
So this computer starting BSODing yesterday/night before in the middle of the night at 11pm (per event log), after having been untouched since I left at 6pm. It BSOD'd frequently in the morning when I arrived (every 30 seconds to 5 minutes) and worsened to the point where it could barely get to the log on screen.
For most of my morning troubleshooting I assumed it was due to a Win 10 update and a driver conflict. The faulting module in the memory dump was intelppm, and the minidumps showed a 124 error (WHEA_UNCORRECTABLE_ERROR (124)), so I figured the chipset and/or CPU drivers were to blame.
However, this is what I've done so far, so now I'm looking for confirmation if I should try an RMA for the CPU (using onboard gpu, so no dedicated gfx card to test):
Up until that last step I was still suspecting that maybe both sticks of RAM failed or that the problem was with the mobo. I have never, in all my years in IT, had a CPU fail during normal use, so I'm still suspicious about my findings.
- The PC has 2x sticks of RAM, so I tried each individually, no change.
- I disconnected all hard disks except the primary, no change.
- I replaced the main disk with another disk with a Win 10 install on it, no change, same BSOD.
- I put in a fresh SSD and loaded the Win 10 installer from a new Win 10 Pro Install disc. It BSOD's TWICE loading the CD interface (Win 10 not even installed yet). I didn't even know you could BSOD at this point...
- The third try it did load the CD interface and got about halfway through the install of Win 10 until it BSOD'd.
- I replaced the PSU with a known good, no change.
- I managed to find another spare machine with a 6700 (non-K) chip, and put that in my machine. Problem gone. I can now boot back into the normal Win 10 install, as well as the secondary test install disk I put in, completely stable for over 18 hours now. It basically went from 30x or so BSOD's an hour to nothing with this switch.
However, with the above tests I'm pretty sure I (mostly) ruled out the RAM, ruled out the hard drives, ruled out the OS, ruled out the PSU. Even though it looks completely certain that it's the CPU, I'm just finding that hard to believe. And if it is the CPU, has anyone experienced something like this? The PC has had a rock solid uptime of about 10 months, treated very well in a clean environment, CPU always ran cool, max temp it has reached is 61C.
Anyway, any thoughts would be appreciated. I don't know how to go about and test the CPU for the purpose of an RMA process without an OS, so any clues on that would be great too!
Included are Sysinfo file for specs and one of the mini dumps in case that confirms anything (though to me the 124 error is probably way too generic).
The CPU I suspect is a 6700K, so the current 6700 that's working is a bit of a step down, but no biggie for my normal workload. Also I haven't bothered putting the second stick of RAM back in, so the specs only show 1 stick at 16 GB.
I'm just about to debug your dump file and will post back shortly.
Basically Bugcheck 124 means a hardware error occurred and usually this Bugcheck is linked with overclocking as well as overheating.
However
This bugcheck can also be caused by a myriad of things and I've even seen an out of date copy of Chrome to be the culprit. Basically we need to run a few tests as well as those already tried.
Gimme an hour and I'll post back with more information.
- Thread Author
- #10
Hi Kemical,Hi Francis,
I'm just about to debug your dump file and will post back shortly.
Basically Bugcheck 124 means a hardware error occurred and usually this Bugcheck is linked with overclocking as well as overheating.
However
This bugcheck can also be caused by a myriad of things and I've even seen an out of date copy of Chrome to be the culprit. Basically we need to run a few tests as well as those already tried.
Gimme an hour and I'll post back with more information.
Great thanks!
The CPU's never been overclocked and always seemed to be running cool (in the 40 to 45C range during normal work, so neither of those should have anything to do with it, unless the PSU was feeding too much power.
I'll wait on your response, maybe by then I'll have had a chance to put the CPU in question into the other PC to see what it does there. If it's a CPU hardware issue it should have the same problems there.
- Joined
- Aug 28, 2007
- Messages
- 36,399
Code:
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
Use !analyze -v to get detailed debugging information.
BugCheck 124, {0, ffffe0017b0e0028, b2000000, 14}
Probably caused by : GenuineIntel
Followup: MachineOwner
Code:
PRIMARY_PROBLEM_CLASS: 0x124_GenuineIntel_PROCESSOR_TLB
A translation lookaside buffer (TLB) is a memory cache that stores recent translations of virtual memory to physical addresses for faster retrieval. This link will explain it in more detail that I could:
http://www.dauniv.ac.in/downloads/CArch_PPTs/CompArchCh10L05TranslLookAheadBuff.pdf
This can be cause by corrupt software but after reading your first post in detail I see you have pretty much tried everything to discount the software (OS).
The only thing I can't seem to find is results for the Link Removed ?
Purpose
The purpose of the Intel® Processor Diagnostic Tool is to verify the functionality of an Intel® Microprocessor. The diagnostic checks for brand identification, verifies the processor operating frequency, tests specific processor features, and performs a stress test on the processor.
Other than running the above, try clearing the Bios by either removing the battery or hold down the little black button. Reboot and load 'optimised settings' before making any further changes.
Agreed and good idea.I'll wait on your response, maybe by then I'll have had a chance to put the CPU in question into the other PC to see what it does there. If it's a CPU hardware issue it should have the same problems there.
- Joined
- Jul 4, 2015
- Messages
- 8,980
- Joined
- Aug 28, 2007
- Messages
- 36,399
Look's about right Neem's as it does say it's part of the memory cache/management sectionDiagram from a book. TLB is inside the CPU die
View attachment 34717
- Joined
- Jul 4, 2015
- Messages
- 8,980
- Joined
- Aug 28, 2007
- Messages
- 36,399
Probably counts me out then..definitely not for everyone.
![Big Grin :D :D](https://cdn.jsdelivr.net/joypixels/assets/8.0/png/unicode/64/1f600.png)
- Thread Author
- #16
Yeah, doesn't seem like something for pleasure reading...
As far as to what's going on right now, the CPU is now cozy in a stock Dell Optiplex 7040 and BSODs on startup shortly before it would reach the Windows log on screen (MACHINE_CHECK_EXCEPTION). This machine seems to be running Win 7, which I don't think should matter though. The BIOS detected the new CPU and reset its settings (original CPU is a 6700, now a 6700K, so should work just fine).
I can't get into Windows, so it's not as easy to get any logs.
Considering this is 100% separate hardware now and a different OS altogether and still blue screening I'm going to stick with my CPU died theory and I'll try for an RMA with Intel.
I can't even run the Intel tool because I can't get into any OS with this CPU now, so that's not something I'll be able to get you. :|
Any other thoughts before I go the RMA route?
As far as to what's going on right now, the CPU is now cozy in a stock Dell Optiplex 7040 and BSODs on startup shortly before it would reach the Windows log on screen (MACHINE_CHECK_EXCEPTION). This machine seems to be running Win 7, which I don't think should matter though. The BIOS detected the new CPU and reset its settings (original CPU is a 6700, now a 6700K, so should work just fine).
I can't get into Windows, so it's not as easy to get any logs.
Considering this is 100% separate hardware now and a different OS altogether and still blue screening I'm going to stick with my CPU died theory and I'll try for an RMA with Intel.
I can't even run the Intel tool because I can't get into any OS with this CPU now, so that's not something I'll be able to get you. :|
Any other thoughts before I go the RMA route?
- Joined
- Aug 28, 2007
- Messages
- 36,399
- Thread Author
- #18
Will do, we'll see how it goes.The fact it does more or less the same thing in two different machines I'd go for the RMA toot sweet. Good luck on a speedy process.
Thanks for the help and moral support.
![Stick Out Tongue :p :p](https://cdn.jsdelivr.net/joypixels/assets/8.0/png/unicode/64/1f61b.png)
- Joined
- Aug 28, 2007
- Messages
- 36,399
Let us know how you get on Francis and please post back if we can advise further.Will do, we'll see how it goes.
Thanks for the help and moral support.![]()
- Joined
- Jan 28, 2013
- Messages
- 2,420
I'm in agreement with kemical and neem on this; sounds to me as if the CPU is defective.
I'd RMA it as advised. If you're as OCD as some of us it might be worth trying to boot from a Ubuntu LiveCD disk or USB stick and see if Linux will boot. That completely takes Windows software out of the picture, and you can even disconnect your bootdrive (C: drive) and boot the Linux from the disk or usb distro in RAM. [THIS STEP IS OPTIONAL, BUT REALLY NARROWS THINGS DOWN IF THE HDD IS OUT OF THE PICTURE!]. If Ubuntu fails to load also, or gives weird load errors, then clearly the CPU chip is bad, as I believe you took necessary steps to make sure the mobo is ok. You can do this in a few hours or maybe half a day. It's pretty easy and the website and instructions for Ubuntu are here: Download Ubuntu Desktop | Download | Ubuntu
Ubuntu is a terrific diagnostic tool,
and it's gotten me out of some real jams at Customer sites who want to just throw their computer away because some other tech or friend told them the Mobo or CPU chip is gone--if Ubuntu boots, not so!!
Usually just a bad hard drive, RAM Stick, or corrupted Windows. Replace the bad parts, reload windows, and 99.9% of the time it's fixed!
I carry a Ubuntu USB stick on my keychain wherever I go--even on Vacation! It's so handy to be able to determine if it's a Mobo/CPU problem or a faulty drive/RAM stick or Windows issue in just a couple of minutes.
Best,![Big Hug :bighug: :bighug:](/images/smilies/yahoo/6.gif)
<<<BBJ>>>
![Waah :waah: :waah:](/images/smilies/yahoo/20.gif)
Ubuntu is a terrific diagnostic tool,
![Thumbs Up :thumbs_up: :thumbs_up:](/images/smilies/msn/thumbs_up.gif)
![No Way :noway: :noway:](/images/smilies/yahoo/68.gif)
![Up :up: :up:](/images/smilies/yahoo/113.gif)
Best,
![Big Hug :bighug: :bighug:](/images/smilies/yahoo/6.gif)
<<<BBJ>>>
Similar threads
- Replies
- 0
- Views
- 41
- Article
- Replies
- 3
- Views
- 605
- Replies
- 2
- Views
- 447
- Replies
- 4
- Views
- 521