Regarding speed there is on github pollard-kangaroo-c99 from Telariust with multicore ..up to 128 cores.
Thanks for pointing to the github. I learned a lot from Telariust code.
Now Ive modified the algorithm (Ill write the whole algorithm here soon) in terms of performing operations specific to python and productivity has increased by 14%. And this is only single-threaded execution.
I also have a multithreaded part ready.
I still can not lay out a single-threaded code, because it is not optimized by memory. Soon