You say:
I can currently squeeze out 6.2 Gk/s on a RTX 4090, but some users here claim they can obtain 8 Gk/s or more
RetiredCoder say:
Note that I have not included all possible optimizations because it's public code and I want to keep it as simple/readable as possible.
Git RCKangaroo say:
about 8GKeys/s on RTX 4090.
You say:
NB regarding RCKangaroo - it runs 1.5x slower than my kernel.
How can this be understood?