4090 cards have a lot more cache than previous ones.
Looking for changes to fit to cache as much data as possible to speed up kangaroo and prevent memory bottleneck kangaroo have.
Code def needs tweaked to see if there as any pickup in performance.
Code also needs tweaked in how it finds DPs.
The best speed I got out of Kangaroo with a 4090 was 7,750 MKey/s and an A100 got 7,350 MKey/s; but I had tweaked the way DPs were found.
I haven't messed with the code since #125 was found though.
Hope you have success!
Just found old exe file and ran a full 44 bit range to check avg speed; 3060Ti = 2,685 MKey/s Kangaroo.
Thank you for the good information.
However, I have an idea to speed up the kangaroo more.
I got a hint from your VBCr.exe file.
If you can modify the kangaroo, please send me an email.
If it is corrected, 130 puzzles can be resolved within two months.
https://t.me/kyscolx