Currently I have about 12.8GKeys/s on 4090. 5090 is a shame, I skip it and wait for next generation.
Perhaps I will make all my sources public when #135 is solved, though I'm not sure, people are not interested in what I do, also I see zero good discussions on this forum about EC, so better I will spend my time for more interesting things

Yes, there are surely many people intrigued by your code; it's just that not all of us have thousands of dollars to explore or buy a high-end PC. What's more unfortunate is that those who do have the means don't offer anything just theories backed by zero code, which is a vague and empty argument. I admit that I plan to include in your final version of Rckangaroo the different kangaroo methods to verify if SOTA is the main factor or if it is the optimization of CUDA code.
All my ideas I published here are proved by sources so everyone can check and confirm them:
RCKangaroo is just a proof that these ideas can be implemented efficiently on GPUs as well.