This is not JeanLucPons version of Kangaroo
this have a lot of unnecessary checks that slow down the process.
JLP uses assembly instructions to perform GPU computations, also overall process of jumping is pretty good optimized in my opinion. Can you please give an example of non-optimal computations in JLP code, by optimizing which we can get a such increase in speed?