Post
Topic
Board Mining (Altcoins)
Re: SILENTARMY v5: Zcash miner, 115 sol/s on R9 Nano, 70 sol/s on GTX 1070
by
Kubuxu
on 13/11/2016, 09:54:05 UTC
SA5 has no more easy optimizations left; the current version maxes out the memory bandwidth of the card.

Does it also max out the PCI-E bus bandwidth? I'm trying to understand why my Nitro+ RX470 (at 75~80 Sol/s) is finding as many shares as my R9 295x2 (2 x 90 Sol/s)...  Any idea?

With the on-GPU solution pruning it barely uses any PCI-e bus bandwidth.  My comment about maxing out the memory bandwidth is a simplification of a complex problem.  The memory path includes L1 & L2 cache as well as the external GDDR5.  I believe parts of the current performance bottleneck are due to L1/L2 cache thrashing.  There's at least a couple ways of solving the problem, but none (that I can think of) are easy.


I was profiling it yesterday, the limiting factor right now is LDS. To optimize it, total memory accessed by the round kernel wave would have to be reduced. Currently it falls into about 16K. CodeXL says that utilization during rounds is 12.5% due to this. I can send you more data if you want or you can profile it yourself with CodeXL 2.2.