Hi Everybody:
As a lot of messages I have seen here, I will start stating that I'm new here and new in mining, I started mining on friday, so far great, I have learnt a lot.
Right now I am mining with only 3 ASUS STRIX RX580 8GB OC, I have reduced the GPU clock in 5% and increased the memory clock speed to 2200 MHz, I have got 27.4 MH/s from 24.5MH/s, all this in AMDs Wattman, I had problems with the Asus Aura Light Effects, it seems it mess with the MH/s fluency, It is off and uninstalled now.
I have read about some widely known power adjustments, I understand the idea is to reduce voltage, so reduce power consumption, and increase memory clock speed, so increase MH/s, I have read too about doing it with AMDs Wattman or MSIs AfterBurner, and about bios flashing.
Knowing many has been able to get 29MH/s and 135 Watts of power from this cards, my questions so far are the following:
- There is need to flash a new bios to get the best performance?, if so, how many MH/s could I get?, or power reduction?.
- If I can get similar improvements with AMDs Wattman, which are the parameters I must use?.
Thank you very much for your support.
Carlos
If you're still using Windows (which is about as stable as a house of cards during a hurricane), then you can easily undervolt using MSI AB or whatever. Also, only 27.4MH/s? Those can't be Samsung, are they?
How do you check for memory errors under Linux? I was asking this question few times with no answer

Quite serious ones will appear in the kernel log - so you'd just check dmesg. I'm pretty sure there's a register with the count somewhere in the space, if I directly access the GPU (not bothering with the driver), but there's already SO MUCH awesome shit I can do with this access that I have yet to implement (and I've implemented plenty!) so it's not that high on my TODO list...
So thats it. There is no way for nonterminal gurus to easilly check for memory errors. I am about 25 years Mac user, also have some FreeBSD & Linux servers. I like Unixes and hate Win, but.... For example this little HWinfo64 is really handy and easy to use. I would like to try build a linux miner system but don't want to ruin my cards running them on the edge of millions of hw errors without knowing it...
And the solution to first run them on Win to find each cards limits, then flash those values into their bioses and after that run them under Linux seems a bit uncomfortable, don't ya think?

ACTUALLY - even if you're an expert in bash, you still aren't able to see the memory errors - all of them. You would have to code - access the GPU directly, telling the driver to go fuck itself, and read them.
About your worry with memory errors - they have zero chance of harming the GPU. A memory error is basically that the delay that was waited before a given memory command simply was not long enough, and as such, you got garbage back (most likely), assuming it was a read command. It's not going to hurt a thing, besides possibly your profits.
There is one other option, if you're a dev with a shitload of time... (or you just bribe me for a copy of mine) - write a tool to directly access the VRM controller(s) on the GPU, and command them directly. This is fun & rewarding, because you find out the Windows... those nice tools like MSI AB and Sapphire Trixx... they hide SO much power from you, and are like safety scissors when you need a scalpel.
Wolf, maybe it would sound strange, but - are you 100% sure that running cards 24/7 for a year with millions of memory errors will have NO impact to possible degradation or damage of that cards? I saw a lot of cards for example RX 480 8GB with Samsung memories, which in normal condition can do easilly 30+ Mhs, but they were hardly hitting just 27 Mhs! Something had to degrade them and memory errors is got an idea no. 1.
I am absolutely certain. The reason I'm paid so well for custom performance timings is because not only do I not copy+paste timing sets, I also do not blindly change values - I understand how to use & interact with GDDR5, and how it functions, to an extent. I don't mean in code, storing & retrieving shit, but more on the level of how to operate it, and how the GPU's memory controller will drive it, the various delays required between different commands issued, and whatnot. Attempting to do something too quickly (like back-to-back ACTIVE commands to rows in different banks without waiting long enough) will simply end with incorrect data if you fuck up just a little, or a memcrash (identifiable by the GPU's core clock being normal, but the memclk dropping to 300 and it not hashing) if you fuck up a lot. You ain't gonna damage it, short of voltage modifications.
Now - I have seen this case of Samsung just not being... well, Samsung, in some cases. In all of them, the issue was heat. Now, I know what you're thinking. Something along the lines of the core temp being more than fine, right? This is due to the cooling being what I call a "show cooler." XFX RS XXX (470 or 480, 4G or 8G), as well as MSI's Armor coolers are ones I have personally bought and confirmed this behavior. They ensure the GPU's ASIC is connected *really* well to the heatsink - and that's about all they do. Most gamers/overclockers/miners don't even know there are other temp sensors, let alone check them... this means that while everything on the PCB besides the core is left to cook, most notably the VRM controller(s) and the GDDR5, all appears well! This has been the cause of my Samsung under-performance issues without fail.
Well, little example. I had 2 identical cards, Nitro+ RX480 4GB Samsung. Same settings, same core, mem, voltages etc... Simply exact same confirmed settings. But one card simply was running 13-15 degC hotter then second one. Yes, far away from each other to make the cooling factor, airflow etc. irrelevant. And guess what? The hotter card even with the same mem straps was hashing a lot lower. And now why was that? Both cards had the same cooling solution from factory, backplate, not sure about Nitro+ VRM heatsink etc. My guess - one card simply was "a bit screwed" or something, my first idea previous owner run it with mem errors and the memory is since that time "more likely to produce another mem errors"...
So what is the conclusion? Should I change settings on all my card to run faster (more Mhs) but with mem errors, or just have all cards running slower but clean with zero mem errors? Where is the border where mem errors will start affecting accepted shares due to incorrect shares?
Cards may look like identical but have different BIOS settings, especially when you mentioned a "previous owner"... BIOS setting do affect voltage, frequencies which affect the temperature. If those cards are used the thermal paste could dry out and have to be replaced...