Just for interest here is what I am getting at the moment. On a 750 GTX Ti running djm34's latest fork off ccminer (
https://github.com/djm34/ccminer) compiled with CUDA 6.0.
Per card of course, and rounded down.
X11 2,300k/s
X13 1,800k/s (about 50w)
x15 1,200k/s
Nist5 7,500k/s (about 54w)
I have tried various versions with CUDA 6.0 and 5.5 but its really hard to see any difference.
I am sticking on TradeMyBit at the moment with a homemade switcher that is flipping between X13 and Nist5 most of the time
I'm using this Mod v1.02-X13-version - 12/06/2014 she works for x11 x13 and x11 doing more in 2.200k / s
I can't recall if v1.02 contains the improved groestl implementation. Give v1.2 a try and you might see an improvement with both x11 and x13.