The goal here is to guess private keys that are already in use. The rule is that there are no rules about what you should use as a random generator.
RTX 4090 is capable of doing 7.0 - 7.1 GK/s to scan a given range, not 4 or 5. Yes, just the keys in a given interval, not the ones with symmetry or endo (in that case, it can surpass 10 GK/s).
What do you guys even use as a random generator if the speed needs to be 10 GK/s?
Zero PRNG. The keys are in sequence, the only difference between hashing keys at random positions, and keys that are in sequence, is that it's much faster to do them in sequence. And there is no risk to hash the same keys twice, because the birthday paradox cannot occur.
Take a sequence of 2**N keys. You need to have a large N.
Split it evenly by number of threads (let's say, for CUDA, we have 16384 blocks * 256 threads each).
Compute the first starting key and delta key (that's only two EC point multiplications).
Compute starting keys for all threads, evenly (that's 16384 * 256 EC group additions - very fast).
Run kernel a required amount of times, to scan full range (it only does EC group additions, and hashing of each key).
For symmetry/endo: also compute hashes of X*beta, X*beta2, combined with -Y. No EC math involved.
So: only two point multiplications, and a shitload of group additions. This is how one gets to 10 GK/s using just 450 watts.