Have a some progress in BSGS for cuda.
Binary search replace with hashtable and after this results is:
With single 2080ti, 570Mgiantstep with 2^26 babysteps HT find key in 338 seconds (start from 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000)
GPU #0 Cnt:000000000000000000000000000000000000000000000000bac8c00000000001 570MKey/s x67108864 2^29.16 x2^26=2^55.16
***********GPU#0************
Total solutions: 2
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5ebb3ef3883c1866d4
Pub: 59a3bfdad718c9d3fac7c187f1139f0815ac5d923910d516e186afda28b221dc994327554ced887aae5d211a2407cdd025cfc3779ecb9c9d7f2f1a1ddf3e9ff8
****************************
Found in 338 seconds
GPU #0 finished
cuda finished ok
With 6x660super, 270Mgiantstep per card with 2^26 babysteps HT find key in 117 seconds (start from 49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5e0000000000000000)
GPU #3 Cnt:000000000000000000000000000000000000000000000000b96a000000000001 272MKey/s x67108864 2^28.09 x2^26=2^54.09
GPU #0 Cnt:000000000000000000000000000000000000000000000000b340000000000001 259MKey/s x67108864 2^28.02 x2^26=2^54.02
GPU #1 Cnt:000000000000000000000000000000000000000000000000ba6e000000000001 274MKey/s x67108864 2^28.10 x2^26=2^54.10
GPU #2 Cnt:000000000000000000000000000000000000000000000000b398000000000001 270MKey/s x67108864 2^28.08 x2^26=2^54.08
***********GPU#3************
Total solutions: 2
GPU #4 Cnt:000000000000000000000000000000000000000000000000bcf0000000000001 269MKey/s x67108864 2^28.08 x2^26=2^54.08
GPU #5 Cnt:000000000000000000000000000000000000000000000000c3fa000000000001 284MKey/s x67108864 2^28.15 x2^26=2^54.15
GPU #0 Cnt:000000000000000000000000000000000000000000000000b658000000000001 261MKey/s x67108864 2^28.03 x2^26=2^54.03
KEY!!>49dccfd96dc5df56487436f5a1b18c4f5d34f65ddb48cb5ebb3ef3883c1866d4
Pub: 59a3bfdad718c9d3fac7c187f1139f0815ac5d923910d516e186afda28b221dc994327554ced887aae5d211a2407cdd025cfc3779ecb9c9d7f2f1a1ddf3e9ff8
****************************
Found in 117 seconds
Much better than before.
