Post
Topic
Board Announcements (Altcoins)
Re: [ANN]CureCoin - Earn while you solve cures for Cancer. True 3.0 crypto
by
ChasingTheDream
on 03/06/2014, 19:00:20 UTC
Calling Aboy68   Cheesy  

Our production is dropping off and I've been having a lot of hardware issues and instability despite under clocking the GPU's literally to their lowest possible settings (both core and speed).  If you are having similar issues try under clocking your RAM.  I got that tip directly from the F@H support and I think the guy may have nailed it!  At least I hope so!  I'll know more in about 24 hours.  I may be able to spin up the GPU's again.  They are running ridiculously under clocked right now.

Despite the bumps I'm still trying to beat you into the top 10.   Grin

Update:  GAH.  Still didn't make it three hours before two machines were down again.  LOL.  So the quest for stability continues...

Im running stock values on all hardware I have, yes sometimes the worker and the WU do get lost in space with the result of 99.99%
To resync is the solution, ea pause and fold commands.
I have a 100% fix on this automaticly no human hands on! The last 3 day I have tryed out the fix and it still works.

Do you want to know how?

//Aboy68

Yes any ideas are welcome.  Virtually every morning at least two of my machines are down meaning I can not restart them folding without physically rebooting the machine.  The machine will not respond to remote restarts or keyboard input.  I actually have to press the reset button.  That happens during the day as well but I'm not always available to do anything about it so they sit for hours like that.  At this point I've got the GPU's under clocked to the maximum amount so it is not the GPU's.  It is something in the systems themselves.  Memory, CPU, something.  I've removed the CPU slot on the troubled machines (after the WU finished of course) but it does not seem to have made any difference in terms of stability.

I'm going to gather my logs and present them to the F@H support group to see if they have any ideas to help speed up the process of getting these things running properly.  As a short term fix I may write a program that reads the logs and if too much time goes by before the log is updated it could force a system restart.  Unfortunately I don't think this will work because whatever is happening makes the system so unstable that I don't think it will be able to restart.

Ironically there are no hardware errors or application errors in Windows Event Viewer though.  This has actually been plaguing me since I started but it was the same way with mining.  It took a long time to get the systems to behave.  This will eventually get worked out.

Unfortunately, as a result I'm only running at about 2/3 my expected output, but it is still better than nothing.  lol

If you have a fix I would love to hear about it!

Are the fans working correctly? Might want to get a tool that lets you see VRM temps too, I had a 7970 experiencing a similar issue (back in the mining days) and VRMs were around 117C. Some more work showed that the fan speed in CCC/afterburner/trixx was incorrect, as the fan had a hardware issue and was spinning with a much higher resistance than it should have.

I used GPU-Z to take a peek and it looks like the highest VRM temp on any of the cards in the troubled machines is 58C.  Also based on your recommendation some time ago I did swap the GPU's out of the most troubled machine with the machine next to it.  The most troubled machine is still the machine having the most issues.  Based on all that, I don't think it is the GPU's at this point but it was definitely worth a look.  Thanks for the suggestion.

Ironically though, I ran GPU-Z on several of my machines that don't seem to want to run for very long.  The most troubled machine ended up getting a video driver failure while I was watching.  The machine was still stable afterwards and I was able to remotely reboot it so it was responding appropriately.  Whatever else is happening makes it so unstable it is on a whole different level of ugly.  Definitely not just a video driver failure.

Try underclocking your system RAM. This helped with one of my rigs.

I actually was talking about that in the first post in this sequence and thought it was going to help because the RAM speed in all the machines was at 2133.  I brought it down to 1333 and unfortunately it didn't help.  I think I'm going to run a memory test next and maybe even swap memory between a machine that behaves somewhat well and the least stable machine.  Hard to believe the memory is bad in 2-3 different machines but I need to rule it out.

Another suggestion from the F@H folks was that I could be overloading a rail on the PSU but the computer that is having the most issues has a 1200 watt Corsair which is a single rail PSU.  So the quest continues.