Current algorithms like Kangaroo don`t give u real keys/s information, hence your surprise, the speed of these algorithms is often related more to statistical performance than direct metrics like keys per second.
You are confusing the exakeys/s shown by some BSGS programs with the real speed (4000+ Mkeys/s) actually computed and analyzed by any real Kangaroo program.
That is, there are indeed 4 billion keys (public keys, and hence by induction private keys) computed per second, and each of them is a complete key (256 bits) which is processed, checked, and then jumped further.
No statistical BS there. Just a direct metric.
I saw the kangaroo code and it uses the length of the jumps as a reference for speed, this is not true, nor exact.
see check.h file.
What's the check.h file? Is it part of the Kangaroo algorithm?
RTX 4090 specs: FP32 (float) 82.58 TFLOPS
That's 82580 billion raw operations/s on floating-point numbers.
Once you divide by the number of instructions needed to do a single kangaroo jump (e.g. point addition under the EC modular field, P + Q = R), you're left with a few good N billion keys/s (where N is 4 or larger depending on the implementation).
You can do 5600000000 (that's 5.6 billion keys/s) on a RTX 4090, just to add that 4000 is slower than what the hardware can accomplish.
Stop spreading false information.
the check.h file is part of kangaroo, it is public, it is not fake information, anyone can review it.