Try to reduce the launchbound in Shabal from 320 to 256. If you have time revert some of the change sets to find the one that slows down.
There are to many changes for me to test ((
Did you mean shavite to change TPB from 320 to 256? (ccminer-windows\x11\cuda_x11_shavite512.cu)