Thank you, colleagues, for suggesting Rust and C

, but for educational purposes, what is specifically required is Python code for CUDA that can brute-force keys.
@numba.jit....To use Numba to compile the performance-critical parts of code into machine code. But this will not work
with large numbers over puzzle 64
example
@njit
def add_numba(P, Q, p=modulo):
<source elided>
@njit
^
This error may have been caused by the following argument(s):
- argument 0: Int value is too large: 110560903758971929709743161563183868968201998016819862389797221564458485814982
- argument 2: Int value is too large: 115792089237316195423570985008687907853269984665640564039457584007908834671663
Numba does not support big int. It is essentially limited to integer types that are supported by numpy. The max integer width is currently limited to 64-bit.
GMP is the best option for now.
Numba does have limitations regarding the types it supports for GPU computations. If you need to work with big integers or other types not supported by Numba directly, you might need to use PyCUDA or other libraries specifically designed for CUDA programming in Python.
And you’ll still need to write the kernel in CUDA C++.

There is no instant solution and it all boils down to the fact that you have to know C++ or some other compiled language language like Rust........