Post
Topic
Board Mining (Altcoins)
Re: Claymore's Dual Ethereum AMD+NVIDIA GPU Miner v9.5 (Windows/Linux)
by
Teress
on 13/06/2017, 15:12:03 UTC
How do you check for memory errors under Linux? I was asking this question few times with no answer Sad
Quite serious ones will appear in the kernel log - so you'd just check dmesg. I'm pretty sure there's a register with the count somewhere in the space, if I directly access the GPU (not bothering with the driver), but there's already SO MUCH awesome shit I can do with this access that I have yet to implement (and I've implemented plenty!) so it's not that high on my TODO list...
I have the same problem. Errors seems to cause incorrect shares.
When Claymore launch watchdog because of hang,  I always spot erreors in the kernel logs (journal -k | grep amdgpu ). It is mostly segmentation faults...
I don't know how you can detect memory errors... it it hardware ECC silent errors counted ? parity errors?
normally uncorrected errors cause bad computation, thus incorrect shares. My rule is that when my watchdog detect incorrect share of GPU hanging, the bios is blacklisted and another selected. Bios don't make two errors, they are fired at the first Grin.

I've got one gpu, which make one incorrect share for 2-3 thousands good shares, running 30.7 Mhs. Zero memory errors on hwinfo64. And you would trash its bios? Can't agree..