Alexis' code is very efficient, as every code I´ve seen from him.
His Skein code results gives INT32 pipe utilization already pretty high (~95% on 1080Ti).
I don't think we can expect too much (if any) improvement, unless you can reduce the total INT32 Operations to calculate hashes.
Not sure if skein has such room for improvement.
Skein is pretty good. So to optimize you need to think different. To much pipe utilization give more heat, and then your card will trottle. Skein perform pretty bad with a reduced tdp, but if you increase the temp limit of the card, you get a nice boost even with less power. Reducing the memory clock will also free more power to increase the coreclock. In skein sp-mod #2 I have managed to reduce the number of integer instructions needed. I have also managed to run the kernel on a higher stable boost clock.
