don't wait for someone to code CUDA GPU
try test
https://github.com/iceland2k14/secp256k1still use CPU but better than simple python
but the result get is pubkey base
I know CUDA GPU fast than 1000x time
how fast between compare speed
python ecdsa+gmpy2 (pip install ecdsa[gmpy2])
python + fastecdsa (can use only on Linux or WSL2, can not use in windows)
python + iceland2k14/secp256k1 (dll on windows)
have other options available for fast point addition , multiplication on CPU
tried but not fruitful , most similar to fastecdsa speed, only gpu based speed will increase,
and all gpu developers are just copy paste each other source code for just increament from privatekeys to addresses/hash160, maybe they dont have time to build as per different calc based, above is very simple basic of point add/sub/mul by intg or pubkey(point), let see which one developer jump with creative mind