Is there a way to use Python GPU for ECC speed up?

Hello. Got interested recently in Cuda technology for existing Python SECP256K1 code speed up. Installed Nvidia 10.2 toolkit and It seems to be running OK to an extent.

I've scrapped a small script for PyCuda that just doubles an integer

Code:

import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule

import numpy
#Working with integers
a = 126
a = numpy.int64(a)
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
mod = SourceModule("""
__global__ void doublify(int *a)
{
int idx = threadIdx.x + threadIdx.y*4;
a[idx] *= 2;
}
""")
func = mod.get_function("doublify")
func(a_gpu, block=(4,4,1))
a_doubled = numpy.empty_like(a)
cuda.memcpy_dtoh(a_doubled, a_gpu)
print(a_doubled)
print(a)

However, It can't work with big numbers (of 256 bit size). When passing for example:

Code:

a = 0xfffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364140

There's an error:

Code:

OverflowError: int too big to convert

Is there a way to use Big integers in PyCuda, CuPy, just a GPU implementation of Python?
Stumbled on Stackoverflow post
https://stackoverflow.com/questions/68215238/numpy-256-bit-computing
but didn't understand anything in it.

I know that in C++, you could use Boost library/dependency to have bigger integer variables. Is there a way to do the same in Python GPU?

Also, Does it even make any sense to use Python GPU solutions, since the main calculations are made in "SourceModule" kernel that has to be coded in C++ anyway?

May be it is just easier to recode the existing python code in C++ with boost library and later add CUDA GPU rendering?