Post
Topic
Board Development & Technical Discussion
Re: 5-7 kangaroo method
by
hskun
on 29/12/2024, 08:11:29 UTC
LLE. Damn, you made me go back to the drawing board, just thinking on what you might have used to get to such speeds. For some of the things I came up with as explanation, there's literally just a couple of results on the entire web, but if those work as advertised, I may get a 3x speedup as well, just by freeing up a lot of registers that keep useless information. There's no other way I see, since I already got 100% compute throughput in the CUDA profiler but I'm nowhere near your speed. Great, now I also have a headache.

Don't work so hard, there is no reason for that Smiley
One more tip for you. As far as I remember there is one more problem in original JLP GPU code: every GPU thread processes A LOT of kangs. It will cause an issue with DPs for high-number puzzles like #130. It's obvious but it seems nobody understands it. Don't you see it too?

I remember that JLP kangaroo only support less than 125bit interval search as the output dist is only 32*4 bits?