...
Therefore a reasonably efficient equihash implementation will do 5 * 64 * 1 million bytes (320MB) of IO per round. With 9 rounds that means 2.88GB per itteration, or 77.8 itterations per second on a Rx 470 with RAM clocked at 7Gbps (224GB/s memory bandwidth). At 1.88 solutions per iteration, that's an average of 146 solutions/second, or about 25% faster than Claymore v5.
The theoretical equihash performance limit on a Rx 470 is likely about 25% faster than 146 solutions, but it involves using 64-byte data structures that requires a lot more memory. So much memory that I think it will not be possible with 4GB cards. At least it will be something for owners of 8GB Rx 480 cards to be happy about.
A few noob questions if you don't mind.
What's the theoretical limit on the RX 470 8G Nitro cards with RAM clocked at 8Gbps (256GB/s)? Also, does overclocking the memory result in a linear increase in performance?
Does this all mean that equihash solving isn't GPU compute limited, but rather memory limited? If so, I wonder why GPU-Z shows 100% GPU load vs sub-40% memory controller load (whereas mining Eth fully loads both core and mem controller...)
Fascinating stuff. Thanks in advance.
A Rx 470 at 8Gbps would have a theoretical limit 8/7 times faster than one at 7Gbps.
The only part of equihash that is compute limited is the blake2b initialization. The intention of the authors was for the algorithm to be limited by memory bandwidth.
As for what GPU-z shows, you'll have to figure out how to correctly interpret what it reports on your own. I do my OpenCL development on Linux, and even if there was a Linux version, I don't consider GPU-z a useful tool for kernel developers.