Post
Topic
Board Development & Technical Discussion
Re: 5-7 kangaroo method
by
RetiredCoder
on 20/10/2024, 11:02:46 UTC
LLE. Damn, you made me go back to the drawing board, just thinking on what you might have used to get to such speeds. For some of the things I came up with as explanation, there's literally just a couple of results on the entire web, but if those work as advertised, I may get a 3x speedup as well, just by freeing up a lot of registers that keep useless information. There's no other way I see, since I already got 100% compute throughput in the CUDA profiler but I'm nowhere near your speed. Great, now I also have a headache.

Don't work so hard, there is no reason for that Smiley
One more tip for you. As far as I remember there is one more problem in original JLP GPU code: every GPU thread processes A LOT of kangs. It will cause an issue with DPs for high-number puzzles like #130. It's obvious but it seems nobody understands it. Don't you see it too?