My problem was, I could not create a stride function inside of CUDA. Or else I would have found 130's key already lol.
Is there a built-in "stride" function specifically in the CUDA kernel? Or maybe there is a workaround in a grid-stride loop? Or do we have to write a new kernel?

If you mean by “stride” jumps of 2-2 like 2,4,6, or jumps of any other number, you simply have to modify G by the pub that corresponds to the number of jumps you want, this does not affect the speed of the scalar multiplication, BSGS, etc.