Hurrah! over 30% jump from ~31.5 Mkeys to ~41.6 Mkeys on a 7700k with 1080Ti
Gpu usage from ~83% now to ~98%
Thanks Arulbero for the optimizations and Rico for quick implementation

28% for GPU clients that were not GPU limited - to be precise.

Your observation is consistent with what is seen on these machines.
i7-6700 CPU @ 3.40GHz + 1080 : 32.47 Mkeys/s -> 42.36 Mkeys/s
That makes the overall collider speed an equivalent of 6 such machines.
If we had 600 of these colliders, the next puzzle transaction privkey would be here in less than 24 hours - worst case.