The Ethash kernel for GCN3 is done.
Theoretically spraking, it cannot get any faster than as it is now with four active wavefronts
per CU.
I am sick of kooking at Ethash, though...
You don't need four wavefronts to saturate the VALU. I explained that in an old blog post:
"Note that some sources state that full SIMD occupancy requires four waves, when it is technically possible with just one wave using only vector instructions."
http://nerdralph.blogspot.com/2017/02/inside-amd-gcn-code-execution.htmlI realized a long time ago that the key to optimal ETHASH performance is not getting more waves, it's optimizing the pattern memory accesses across the CUs to avoid contention.