I've built ASM kernels for Baffin (RX460/560) cards with good speedup for DCR/SIA/PASC and may be 1% for ETH, now I check what I can do for Nvidia cards. New version will be available in 2-4 days.
Impatiently waiting for this

I'm curious, would optimizations help your zcash miner too,
I find it strange that old 7850 1GB Radeons have 50% better
performance than RX560s with 4GB memory
It is not memory size related.
https://www.techpowerup.com/gpudb/2940/radeon-rx-560Memory Bus: 128 bit
Bandwidth: 112.0 GB/s
https://www.techpowerup.com/gpudb/1055/radeon-hd-7850Memory Bus: 256 bit
Bandwidth: 153.6 GB/s
But anyway I can get some speedup by building ASM kernels for Baffin.
While what you say is obviously true (reason why old 280X have superb ZEC and subpar ETH results, due to 384 bit bus)
I think there's still room for improvement, based on your recent work on ETH miner...it brought memory controller utilization
from 60% range all the way up to 90%, along with improvements in secondary coin mining
Currently your ZEC miner uses only 33% memory controller utilization on my card
Of course, I'm not programmer, so I could be totally wrong
