This is not JeanLucPons version of Kangaroo
this have a lot of unnecessary checks that slow down the process.
JLP uses assembly instructions to perform GPU computations, also overall process of jumping is pretty good optimized in my opinion. Can you please give an example of non-optimal computations in JLP code, by optimizing which we can get a such increase in speed?
this is not two byte changes

need rewrite or many many refactoring.
1. split search either looking for tame or wild
2. remove realtime collision checks
3. modify check for prefixes of founded kangaroo
4. try other cuda implementations of ecp VanitySearch, BitCrack and forks here have many ideas
Splitting the search for wild or tame does not increase speed, it just targets one or the other. I've been doing this for at least 2 years. Find enough tames, then switch to wilds.
Real-time collision checks only happens on server side (if using a server) so that shouldn't slow down the GPUs, they just send the DPs to the server.
What do you mean by modify check for prefixes of founded kangaroo??