Wait a minute. If he just said that the 3060 achieves around 2300 Mkeys/s, how much does the 4090 achieve? Does it reach 8000 Mkeys/s on the RTX 4090? 8 GK/s ? In Cyclone GPU ?
Why won't anyone share the fastest GPU code here? Are you hiding the best code for yourselves? It's all just empty talk and blah blah blah..

Because no one is obliged to put their work in the public domain for everyone to see. They spend their time and energy on it.
Because I am doing it right now:) The main target - to be twice faster than KeyhuntCUDA, but it is possible only with PTX ASM. And also if somebody knows an algo of Modular inverse faster that DRS62 - let me know. this is the main goal for me. Or stupidly do all of the code with PTX, that impossible for me
When computing modular inverses in batch, the individual function to calculate the inverse has very little impact on the overall speed. In my version, a "stock" 4090 achieves around 7Gkey/s with some optimizations.
Still, it's crazy to see how in a group like this, 3/4 of the posts go against all common sense and basic rules of statistics. I'm referring to the absurd theories about prefixes and bit permutations.