My problem was, I could not create a stride function inside of CUDA. Or else I would have found 130's key already lol.
Is there a built-in "stride" function specifically in the CUDA kernel? Or maybe there is a workaround in a grid-stride loop? Or do we have to write a new kernel?
