Post
Topic
Board Mining (Altcoins)
Re: Claymore's ZCash AMD GPU Miner v11.0 (Windows/Linux)
by
gsarducci
on 15/01/2017, 00:10:19 UTC
Hey Claymore and gang.  I've got a funky one for you...

(tl:dr? Go to the bottom for update.)

So I just cobbled together this 4 card rig.  Previously I had all these parts running in a case with 2 7950 cards, one Sapphire and one XFX.  I put together an open-air rig and risers using the same hardware that was in the case (MB, Memory, proc), and added a pair of MSI 7950's, for a total of 4 cards.  I daisy chained a 500W to the 1000W power supply and have the 500W powering the MB, three 120mm fans to move air across the cards, a 1TB conventional HD, and 1 card.  The 1000W is running three cards.  I am not at my rig at the moment so I don't know which card is running off the 500W supply (this I admit might become important).  Fresh Win 7 install, 15.12 drivers, GPU-Z and HWMonitor, Teamviewer and nothing else.  After fighting with Windows and finally getting everything talking to each other I ran some spot tests for power draw and stability.  Everything looked good.  Started v11.0 miner and ran it for about an hour this morning and found that with my previous settings I used in v9.2 I was getting around 225 from the Sapphire, 220 to 230 from the MSI twins, and about 215 to 220 from the XFX.  I also discovered that one of the fans was vibrating about its shaft and generally being a pain in the ass.  I removed it and attempted to lube it but that wasn't its problem.  That being said, it still provided adequate cooling as indicated by GPU-Z telemetry.

About 15 minutes before I left to go to work I fired it all up and put it to work, walking out the door around 1pm.  According to the logs, temps across all the cards were steady around 62C to 65C (-tt set to 65) with fans ranging from 55% for the XFX to 40% (the baseline min) for the Sapphire.  About 25 minutes into the run the logs show the fans on the XFX ramping up to around 75%, but the temps were steady around 65C.  Seems the fan is starting to act up again.  Anyway, shortly after this I see the GPU temp fall rapidly and about 3 minutes later the watchdog found it unresponsive and commanded a restart. 

When it came back up it seemed to initialize fine but now it's running about 100 sols/s slower than the other three cards, at around 105 to 125 sols/s.  Looking at GPU-Z, the thermal and voltage readings are inline with a card at 100% duty cycle just like its brothers, but it's doing half the work.  Logs don't show anything extraordinarily funky except for one thing: when the watchdog ran a minute after the restart it showed:

"warning: solutions buf overflow, 1928276 > 24"

In the log prior to this was the speed check which showed the XFX running at 120 sols/s.

A few observations I hope might help.

I am not at the rig.  I am elsewhere monitoring it, so I apologize for not giving log snippets or screenshots.

All cards have stock BIOS.

All cards are normalized at 950/1350.  The XFX is based at 800/1250 but has been running at 950/1350 for months with zero issues.

All voltages are stock.

All cards are on risers (USB/powered).

The XFX card's riser is in the 16X slot and is driving the monitor.

The XFX is rock solid and happily hashing out 125 to 135 sols/s with no apparent failures. Telemetry suggests it's working as hard as the other cards.

I am not getting any rejected shares, neither prior to nor after the incident.

I can neither confirm nor deny that the XFX is the sole card connected to the 500W supply at the moment (aforementioned proximity to rig).

The 500W supply was not in the system when it was in the case running two cards. I removed it and installed a 1000W supply when I added the second card.

The 1000W supply is a single 12V rail supply.

The card hashes about 435 h/s using Claymore's Cryptonote 9.7 miner, while its brothers push out around 480 h/s. Oddly, the XFX is doing this using 37% GPU load, where the others are at 68%. O.o  Telemetry suggests this is true.

The card is deficient running v9.2 of the Zec miner.  It didn't have this problem with the same settings prior to transplant.  All cards are hashing around 180 sols/s except for the XFX (125 sols/s)

The tests show you are not the father.

So, whadda ya brains think? 

EDIT: Well, it looks like this card is lagging across the board, so it's certainly a hardware problem, and I'm suspicious that it's a power problem.  To that end, how much power do these kind of cards pull from the motherboard?

EDIT 2: Nevermind. 75W for PCIex16, 25W for PCIeX1.  I'm assuming that all cards are pulling 25 watts out of the board, thus 100W I didn't account for coming out of the 500W supply.  Hmmm...