I guess time to fire up the GPUs and make an AMD / Xilinx build of these solvers

Have you tried messing with a stride function with vanitysearch? If I had that, I think I could solve 125 faster than kangaroo...
I need to implement GPU stride function for vanitysearch/keyhunt-cuda