Current algorithms like Kangaroo don`t give u real keys/s information, hence your surprise, the speed of these algorithms is often related more to statistical performance than direct metrics like keys per second.
You are confusing the exakeys/s shown by some BSGS programs with the real speed (4000+ Mkeys/s) actually computed and analyzed by any real Kangaroo program.
That is, there are indeed 4 billion keys (public keys, and hence by induction private keys) computed per second, and each of them is a complete key (256 bits) which is processed, checked, and then jumped further.
No statistical BS there. Just a direct metric.
I saw the kangaroo code and it uses the length of the jumps as a reference for speed, this is not true, nor exact.
see check.h file.