...
you will never get 250 MK/s on any GPU, by computing H160 out of some list of private keys.
...
You just don't have enough experience in GPU coding or/and don't know about some efficient methods for EC scalar multiplication.
4090 can do >0.5GK/s at rndprivkey-to-h160, I can publish the code, just need to find some time to make my sources more readable to make them public.
That's great and all, but I'm not sure your estimations fit the context of what you've quoted.
I'm well aware that there are a lot of optimizations for scalar multiplication, and it's great you have really fast code to do it in CUDA as fast as you state (which sounds miraculous in itself to be honest), but there's one thing you maybe missed: the private keys are not random, they're fed as input. Yeah, maybe redo the math after crawling private keys from GPU memory

Not sure why you had to bring smth from page 520 of that topic here though. It'd be much more interesting if you instead update us on 135 progress
