Post
Topic
Board Development & Technical Discussion
Re: 5-7 kangaroo method
by
bisovskiy
on 02/09/2025, 14:17:59 UTC
LLE. Damn, you made me go back to the drawing board, just thinking on what you might have used to get to such speeds. For some of the things I came up with as explanation, there's literally just a couple of results on the entire web, but if those work as advertised, I may get a 3x speedup as well, just by freeing up a lot of registers that keep useless information. There's no other way I see, since I already got 100% compute throughput in the CUDA profiler but I'm nowhere near your speed. Great, now I also have a headache.

Don't work so hard, there is no reason for that Smiley
One more tip for you. As far as I remember there is one more problem in original JLP GPU code: every GPU thread processes A LOT of kangs. It will cause an issue with DPs for high-number puzzles like #130. It's obvious but it seems nobody understands it. Don't you see it too?

Yes, thanks. I noticed that as well. Too bad it’s impossible to parallelize a sequential process. For now, it looks like any restart can affect finding a collision, since we can only detect it on the second occurrence of the same DP along a trajectory — which takes time. I’ve been thinking about this in two ways: first, adding progress checkpoints and setting an acceptable DP threshold; second, deliberately forcing trajectories to reset right after a DP is found. That way, we slightly improve the speed of reaching a DP — though not to be confused with overall performance.