don't wait for someone to code CUDA GPU
try test
https://github.com/iceland2k14/secp256k1still use CPU but better than simple python
but the result get is pubkey base
I know CUDA GPU fast than 1000x time
how fast between compare speed
python ecdsa+gmpy2 (pip install ecdsa[gmpy2])
python + fastecdsa (can use only on Linux or WSL2, can not use in windows)
python + iceland2k14/secp256k1 (dll on windows)
have other options available for fast point addition , multiplication on CPU